Want to create an interactive transcript for this episode?
Podcast: Software Engineering Daily
Episode: Data Lakehouse with Michael Armbrust
Description: A data warehouse is a system for performing fast queries on large amounts of data. A data lake is a system for storing high volumes of data in a format that is slow to access. A typical workflow for a data engineer is to pull data sets from this slow data lake storage into the data warehouse for faster querying.Apache Spark is a system for fast processing of data across distributed datasets. Spark is not thought of as a data warehouse technology, but it can be used to fulfill some of the responsibilities. Delta is an...