Arxiv Papers
Podcast image
[QA] On the Theoretical Limitations of Embedding-Based Retrieval
8 mins; September 01, 2025
On the Theoretical Limitations of Embedding-Based Retrieval
23 mins; September 01, 2025
[QA] Beyond GPT-5: Making LLMs Cheaper and Better via Performance–Efficiency Optimized Routing
7 mins; August 22, 2025
Beyond GPT-5: Making LLMs Cheaper and Better via Performance–Efficiency Optimized Routing
9 mins; August 22, 2025
[QA] Measuring the environmental impact of delivering AI at Google Scale
8 mins; August 22, 2025
Measuring the environmental impact of delivering AI at Google Scale
22 mins; August 22, 2025
[QA] Deep Think with Confidence
7 mins; August 21, 2025
Deep Think with Confidence
18 mins; August 21, 2025
[QA] Intern-S1: A Scientific     Multimodal Foundation Model
8 mins; August 21, 2025
Intern-S1: A Scientific     Multimodal Foundation Model
49 mins; August 21, 2025
[QA] Search-Time Data Contamination
7 mins; August 19, 2025
Search-Time Data Contamination
19 mins; August 19, 2025
[QA] Thyme: Think Beyond Images
7 mins; August 18, 2025
Thyme: Think Beyond Images
25 mins; August 18, 2025
[QA] SSRL: Self-Search Reinforcement Learning
7 mins; August 18, 2025
SSRL: Self-Search Reinforcement Learning
32 mins; August 18, 2025
[QA] Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
7 mins; August 13, 2025
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
31 mins; August 13, 2025
[QA] Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
7 mins; August 13, 2025
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
28 mins; August 13, 2025
[QA] Part 1: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
7 mins; August 12, 2025
Part 1: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
25 mins; August 12, 2025
[QA] MolmoAct: Action Reasoning Models that can Reason in Space
7 mins; August 12, 2025
MolmoAct: Action Reasoning Models that can Reason in Space
36 mins; August 12, 2025
[QA] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
7 mins; August 08, 2025
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
21 mins; August 08, 2025
[QA] R-Zero: Self-Evolving Reasoning LLM from Zero Data
7 mins; August 08, 2025
R-Zero: Self-Evolving Reasoning LLM from Zero Data
22 mins; August 08, 2025
[QA] Live Music Models
7 mins; August 07, 2025
Live Music Models
14 mins; August 07, 2025
[QA] Causal Reflection with Language Models
7 mins; August 07, 2025
Causal Reflection with Language Models
18 mins; August 07, 2025
[QA] SOTOPIA-RL: Reward Design for Social Intelligence
8 mins; August 06, 2025
SOTOPIA-RL: Reward Design for Social Intelligence
16 mins; August 06, 2025
[QA] Agent Lightning: Train ANY AI Agents with Reinforcement Learning
7 mins; August 06, 2025
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
42 mins; August 06, 2025
[QA] Self-Questioning Language Models
7 mins; August 05, 2025
Self-Questioning Language Models
15 mins; August 05, 2025
[QA] Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
8 mins; August 05, 2025
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
24 mins; August 05, 2025
[QA] Fast and scalable retrosynthetic planning with a transformer neural network and speculative beam search
7 mins; August 04, 2025
Fast and scalable retrosynthetic planning with a transformer neural network and speculative beam search
13 mins; August 04, 2025
[QA] Embryology of a Language Model
7 mins; August 04, 2025
Embryology of a Language Model
18 mins; August 04, 2025
[QA] Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models
7 mins; August 04, 2025
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models
17 mins; August 04, 2025
[QA] CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
7 mins; August 03, 2025
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
19 mins; August 03, 2025
[QA] Meta CLIP 2: A Worldwide Scaling Recipe
8 mins; August 03, 2025
Meta CLIP 2: A Worldwide Scaling Recipe
20 mins; August 03, 2025
[QA] Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader Impacts
9 mins; July 28, 2025
Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader Impacts
57 mins; July 28, 2025
[QA] AlphaGo Moment for Model Architecture Discovery
7 mins; July 27, 2025
AlphaGo Moment for Model Architecture Discovery
23 mins; July 27, 2025
[QA] Learning without training: The implicit dynamics of in-context learning
8 mins; July 27, 2025
Learning without training: The implicit dynamics of in-context learning
13 mins; July 27, 2025
[QA] NABLA: Neighborhood Adaptive Block-Level Attention
7 mins; July 26, 2025
NABLA: Neighborhood Adaptive Block-Level Attention
12 mins; July 26, 2025
[QA] Checklists Are Better Than Reward Models For Aligning Language Models
5 mins; July 26, 2025
Checklists Are Better Than Reward Models For Aligning Language Models
13 mins; July 26, 2025
[QA] Beyond Binary Rewards: Training LMs to Reason about Their Uncertainty
7 mins; July 24, 2025
Beyond Binary Rewards: Training LMs to Reason about Their Uncertainty
15 mins; July 24, 2025
[QA] Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
7 mins; July 24, 2025
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
12 mins; July 24, 2025
[QA] Does More Inference-Time Compute Really Help Robustness?
7 mins; July 23, 2025
Does More Inference-Time Compute Really Help Robustness?
20 mins; July 23, 2025
[QA] Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
7 mins; July 23, 2025
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
25 mins; July 23, 2025
[QA] Inverse Scaling in Test-Time Compute
7 mins; July 22, 2025
Inverse Scaling in Test-Time Compute
20 mins; July 22, 2025
[QA] The Invisible Leash: Why RLVR May Not Escape Its Origin
8 mins; July 22, 2025
The Invisible Leash: Why RLVR May Not Escape Its Origin
21 mins; July 22, 2025
[QA] Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
8 mins; July 22, 2025
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
22 mins; July 22, 2025
[QA] Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
7 mins; July 22, 2025
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
27 mins; July 22, 2025
[QA] AGENTSNET: Coordination and Collaborative Reasoning in Multi-Agent LLMs
7 mins; July 13, 2025
AGENTSNET: Coordination and Collaborative Reasoning in Multi-Agent LLMs
19 mins; July 13, 2025
[QA] One Token to Fool LLM-as-a-Judge
7 mins; July 13, 2025
One Token to Fool LLM-as-a-Judge
17 mins; July 13, 2025
[QA] Should We Still Pretrain Encoders with Masked Language Modeling?
8 mins; July 12, 2025
Should We Still Pretrain Encoders with Masked Language Modeling?
16 mins; July 12, 2025
[QA] Token Bottleneck: One Token to Remember Dynamics
7 mins; July 12, 2025
Token Bottleneck: One Token to Remember Dynamics
16 mins; July 12, 2025
[QA] A Systematic Analysis of Hybrid Linear Attention
7 mins; July 11, 2025
A Systematic Analysis of Hybrid Linear Attention
15 mins; July 11, 2025
[QA] First Return, Entropy-Eliciting Explore
7 mins; July 11, 2025
First Return, Entropy-Eliciting Explore
21 mins; July 11, 2025
[QA] Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
8 mins; July 11, 2025
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
15 mins; July 11, 2025
[QA] Scaling RL to Long Videos
8 mins; July 11, 2025
Scaling RL to Long Videos
15 mins; July 11, 2025
[QA] Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving
8 mins; July 09, 2025
Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving
21 mins; July 09, 2025
[QA] Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
7 mins; July 09, 2025
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
18 mins; July 09, 2025
[QA] The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation
7 mins; July 09, 2025
The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation
23 mins; July 09, 2025
[QA] Differential Mamba
7 mins; July 09, 2025
Differential Mamba
18 mins; July 09, 2025