Want to create an interactive transcript for this episode?
Podcast: Software Engineering Daily
Episode: Snorkel: Training Dataset Management with Braden Hancock
Description: Machine learning models require the use of training data, and that data needs to be labeled. Today, we have high quality data infrastructure tools such as TensorFlow, but we donβt have large high quality data sets. For many applications, the state of the art is to manually label training examples and feed them into the training process.Snorkel is a system for scaling the creation of labeled training data. In Snorkel, human subject matter experts create labeling functions, and these functions are applied to large quantities of data in order to label it.Β For exa...