Want to create an interactive transcript for this episode?
Podcast: The Stack Overflow Podcast
Episode: Tragedy of the (data) commons
Description: The Data Provenance Initiative is a collective of volunteer AI researchers from around the world. They conduct large-scale audits of the massive datasets that power state-of-the-art AI models with a goal of mapping the landscape of AI training data to improve transparency, documentation, and informed use of data. Their Explorer tool allows users to filter and analyze the training datasets typically used by large language models.Shayne and Robert are the authors of a new study called Consent in Crisis: The Rapid Decline of the AI Data Commons: the first large-scale, longitudinal audit of the consent protocols...