Want to create an interactive transcript for this episode?
Podcast: Machine Learning Street Talk (MLST)
Episode: Prof. Melanie Mitchell 2.0 - AI Benchmarks are Broken!
Description: Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/ESrGqhf5CB
Prof. Melanie Mitchell argues that the concept of "understanding" in AI is ill-defined and multidimensional - we can't simply say an AI system does or doesn't understand. She advocates for rigorously testing AI systems' capabilities using proper experimental methods from cognitive science. Popular benchmarks for intelligence often rely on the assumption that if a human can perform a task, an AI that performs the task must have human-like general intelligence. But benchmarks should evolve as capabilities improve.
Large language models show surprising skill on many human tasks...