Arxiv Papers
Podcast image
[QA] Do Large Language Model Benchmarks Test Reliability?
6 mins; February 06, 2025
Do Large Language Model Benchmarks Test Reliability?
9 mins; February 05, 2025
Detecting Strategic Deception Using Linear Probes
23 mins; February 05, 2025
[QA] Evaluation of Large Language Models via Coupled Token Generation
8 mins; February 04, 2025
Evaluation of Large Language Models via Coupled Token Generation
10 mins; February 04, 2025
[QA] Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
8 mins; February 04, 2025
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
18 mins; February 04, 2025
[QA] Should You Use Your Large Language Model to Explore or Exploit?
7 mins; February 04, 2025
Should You Use Your Large Language Model to Explore or Exploit?
16 mins; February 04, 2025
[QA] Harmonic Loss Trains Interpretable AI Models
7 mins; February 03, 2025
Harmonic Loss Trains Interpretable AI Models
15 mins; February 03, 2025
[QA] Trading inference-time compute for adversarial robustness.
7 mins; February 02, 2025
Trading inference-time compute for adversarial robustness.
24 mins; February 02, 2025
[QA] LLMs can see and hear without any training
7 mins; February 01, 2025
LLMs can see and hear without any training
18 mins; February 01, 2025
[QA] o3-mini vs DeepSeek-R1: Which One is Safer?
7 mins; February 01, 2025
o3-mini vs DeepSeek-R1: Which One is Safer?
13 mins; February 01, 2025
[QA] Optimizing Large Language Model Training Using FP4 Quantization
8 mins; February 01, 2025
Optimizing Large Language Model Training Using FP4 Quantization
21 mins; February 01, 2025
[QA] People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text
7 mins; February 01, 2025
People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text
26 mins; February 01, 2025
[QA] Large Language Models Think Too Fast To Explore Effectively
8 mins; January 31, 2025
Large Language Models Think Too Fast To Explore Effectively
11 mins; January 31, 2025
[QA] Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
7 mins; January 31, 2025
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
15 mins; January 31, 2025
[QA] Early External Safety Testing of OpenAI’s o3-mini: Insights from Pre-Deployment Evaluation
6 mins; January 30, 2025
Early External Safety Testing of OpenAI’s o3-mini: Insights from Pre-Deployment Evaluation
11 mins; January 30, 2025
[QA] Dynamics of Transient Structure in In-Context Linear Regression Transformers
7 mins; January 29, 2025
Dynamics of Transient Structure in In-Context Linear Regression Transformers
10 mins; January 29, 2025
[QA] Context is Key for Agent Security
7 mins; January 28, 2025
Context is Key for Agent Security
17 mins; January 28, 2025
[QA] AXBENCH: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
7 mins; January 28, 2025
AXBENCH: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
9 mins; January 28, 2025
[QA] Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models
7 mins; January 27, 2025
Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models
15 mins; January 27, 2025
[QA] Feasible Learning
9 mins; January 27, 2025
Feasible Learning
18 mins; January 27, 2025
[QA] Humanity's Last Exam
8 mins; January 26, 2025
Humanity's Last Exam
12 mins; January 26, 2025
[QA] GaussMark: A Practical Approach for Structural Watermarking of Language Models
7 mins; January 26, 2025
GaussMark: A Practical Approach for Structural Watermarking of Language Models
34 mins; January 26, 2025
[QA] Kimi k1.5: Scaling Reinforcement Learning with LLMs
7 mins; January 25, 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
19 mins; January 25, 2025
[QA] Can We Generate Images with CoT?
7 mins; January 25, 2025
Can We Generate Images with CoT?
24 mins; January 25, 2025
[QA] Physics of Skill Learning
7 mins; January 24, 2025
Physics of Skill Learning
41 mins; January 24, 2025
[QA] Hallucinations Can Improve Large Language Models in Drug Discovery
8 mins; January 24, 2025
Hallucinations Can Improve Large Language Models in Drug Discovery
15 mins; January 24, 2025
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
17 mins; January 24, 2025
[QA] Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models
8 mins; January 23, 2025
Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models
17 mins; January 23, 2025
[QA] FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces
7 mins; January 23, 2025
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces
17 mins; January 23, 2025
[QA] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
8 mins; January 23, 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 mins; January 23, 2025
[QA] Reasoning Language Models: A Blueprint
8 mins; January 22, 2025
Reasoning Language Models: A Blueprint
57 mins; January 22, 2025
[QA] Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
7 mins; January 22, 2025
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
21 mins; January 22, 2025
[QA] Evolving Deeper LLM Thinking
7 mins; January 20, 2025
Evolving Deeper LLM Thinking
14 mins; January 20, 2025
PaSa: An LLM Agent for Comprehensive Academic Paper Search
17 mins; January 19, 2025
[QA] Enhancing Generalization in Chain of Thought Reasoning for Smaller Models
7 mins; January 19, 2025
[QA] How GPT Learns Layer by Layer
8 mins; January 18, 2025
How GPT Learns Layer by Layer
15 mins; January 18, 2025
[QA] FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
6 mins; January 18, 2025
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
12 mins; January 18, 2025
[QA] Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
7 mins; January 17, 2025
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
18 mins; January 17, 2025
[QA] LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
7 mins; January 17, 2025
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
24 mins; January 17, 2025
[QA] Towards Understanding Extrapolation: a Causal Lens
8 mins; January 16, 2025
Towards Understanding Extrapolation: a Causal Lens
16 mins; January 16, 2025
[QA] Do generative video models learn physical principles from watching videos?
8 mins; January 16, 2025
Do generative video models learn physical principles from watching videos?
20 mins; January 16, 2025
[QA] Joint Learning of Depth and Appearance for Portrait Image Animation
9 mins; January 15, 2025
Joint Learning of Depth and Appearance for Portrait Image Animation
17 mins; January 15, 2025
[QA] Dissecting a Small Artificial Neural Network
7 mins; January 15, 2025
Dissecting a Small Artificial Neural Network
19 mins; January 15, 2025
[QA] Diffusion Adversarial Post-Training for One-Step Video Generation
7 mins; January 14, 2025
Diffusion Adversarial Post-Training for One-Step Video Generation
23 mins; January 14, 2025
[QA] Inference-Time-Compute: More Faithful? A Research Note
7 mins; January 14, 2025
Inference-Time-Compute: More Faithful? A Research Note
9 mins; January 14, 2025
[QA] Transformer: Self-adaptive LLMs
7 mins; January 13, 2025
Transformer: Self-adaptive LLMs
24 mins; January 13, 2025
[QA] Soup to go: mitigating forgetting during continual learning with model averaging
7 mins; January 12, 2025
Soup to go: mitigating forgetting during continual learning with model averaging
17 mins; January 12, 2025
[QA] Emergent Symbol-like Number Variables in Artificial Neural Networks
7 mins; January 12, 2025
Emergent Symbol-like Number Variables in Artificial Neural Networks
20 mins; January 12, 2025
[QA] Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
7 mins; January 10, 2025
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
12 mins; January 10, 2025
[QA] Representing Long Volumetric Video with Temporal Gaussian Hierarchy
8 mins; January 10, 2025
Representing Long Volumetric Video with Temporal Gaussian Hierarchy
31 mins; January 10, 2025
[QA] Uncertainty-aware Knowledge Tracing
7 mins; January 09, 2025
Uncertainty-aware Knowledge Tracing
20 mins; January 09, 2025
[QA] The GAN is dead; long live the GAN! A Modern Baseline GAN
7 mins; January 09, 2025
The GAN is dead; long live the GAN! A Modern Baseline GAN
25 mins; January 09, 2025
[QA] Supervision-free Vision-Language Alignment
7 mins; January 08, 2025
Supervision-free Vision-Language Alignment
19 mins; January 08, 2025