Want to create an interactive transcript for this episode?
Podcast: Software Engineering Daily
Episode: HoloClean: Data Quality Management with Theodoros Rekatsinas
Description: Many data sources produce new data points at a very high rate. With so much data, the issue of data quality emerges. Low quality data can degrade the accuracy of machine learning models that are built around those data sources. Ideally, we would have completely clean data sources, but thatβs not very realistic. One alternative is a data cleaning system, which can allow us to clean up the data after it has already been generated.HoloClean is a statistical inference engine that can impute, clean, and enrich data. HoloClean is centered around βThe Probabilistic Unclean Database Mode...