Arxiv Papers
Podcast image
[QA] From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models
8 mins; April 08, 2025
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models
22 mins; April 08, 2025
[QA] Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
6 mins; April 08, 2025
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
15 mins; April 08, 2025
[QA] Can ChatGPT Learn My Life From a Week of First-Person Video?
7 mins; April 08, 2025
Can ChatGPT Learn My Life From a Week of First-Person Video?
8 mins; April 08, 2025
[QA] Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs
8 mins; April 08, 2025
Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs
17 mins; April 08, 2025
[QA] Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
8 mins; April 06, 2025
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
25 mins; April 06, 2025
[QA] Agentic Knowledgeable Self-awareness
7 mins; April 06, 2025
Agentic Knowledgeable Self-awareness
18 mins; April 06, 2025
[QA] Inference-Time Scaling for Generalist Reward Modeling
8 mins; April 05, 2025
Inference-Time Scaling for Generalist Reward Modeling
18 mins; April 05, 2025
[QA] Multi-Token Attention
8 mins; April 05, 2025
Multi-Token Attention
18 mins; April 05, 2025
[QA] Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
6 mins; March 29, 2025
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
16 mins; March 29, 2025
[QA] Wan: Open and Advanced Large-Scale Video Generative Models
8 mins; March 28, 2025
Wan: Open and Advanced Large-Scale Video Generative Models
64 hours 43 mins; March 28, 2025
[QA] UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning
8 mins; March 28, 2025
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning
16 mins; March 28, 2025
[QA] SWI: Speaking with Intent in Large Language Models
7 mins; March 27, 2025
SWI: Speaking with Intent in Large Language Models
9 mins; March 27, 2025
[QA] Unified Multimodal Discrete Diffusion
7 mins; March 27, 2025
Unified Multimodal Discrete Diffusion
20 mins; March 27, 2025
[QA] Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
17 mins; March 26, 2025
Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
17 mins; March 26, 2025
[QA] Open Deep Search: Democratizing Search with Open-source Reasoning Agents
7 mins; March 26, 2025
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
17 mins; March 26, 2025
[QA] LookAhead Tuning: Safer Language Models via Partial Answer Previews
7 mins; March 25, 2025
LookAhead Tuning: Safer Language Models via Partial Answer Previews
8 mins; March 25, 2025
[QA] ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
7 mins; March 25, 2025
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
13 mins; March 25, 2025
[QA] FFN Fusion: Rethinking Sequential Computation in Large Language Models
8 mins; March 24, 2025
FFN Fusion: Rethinking Sequential Computation in Large Language Models
21 mins; March 24, 2025
[QA] Modifying Large Language Model Post-Training for Diverse Creative Writing
8 mins; March 24, 2025
Modifying Large Language Model Post-Training for Diverse Creative Writing
18 mins; March 24, 2025
[QA] Users Favor LLM-Generated Content—Until They Know It's AI
6 mins; March 23, 2025
Users Favor LLM-Generated Content—Until They Know It's AI
7 mins; March 23, 2025
[QA] Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs
7 mins; March 23, 2025
Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs
20 mins; March 23, 2025
[QA] DAPO: An Open-Source LLM Reinforcement Learning System at Scale
8 mins; March 22, 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
14 mins; March 22, 2025
[QA] SynCity: Training-Free Generation of 3D Worlds
7 mins; March 22, 2025
SynCity: Training-Free Generation of 3D Worlds
13 mins; March 22, 2025
[QA] TULIP: Towards Unified Language-Image Pretraining
7 mins; March 21, 2025
TULIP: Towards Unified Language-Image Pretraining
12 mins; March 21, 2025
[QA] Causal Emergence 2.0: Quantifying emergent complexity
7 mins; March 21, 2025
Causal Emergence 2.0: Quantifying emergent complexity
31 mins; March 21, 2025
[QA] Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them
9 mins; March 20, 2025
Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them
22 mins; March 20, 2025
[QA] Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
7 mins; March 20, 2025
Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
24 mins; March 20, 2025
[QA] Cube: A Roblox View of 3D Intelligence
7 mins; March 19, 2025
Cube: A Roblox View of 3D Intelligence
15 mins; March 19, 2025
[QA] SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
7 mins; March 19, 2025
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
20 mins; March 19, 2025
[QA] Measuring AI Ability to Complete Long Tasks
7 mins; March 19, 2025
Measuring AI Ability to Complete Long Tasks
44 mins; March 19, 2025
[QA] Impossible Videos
8 mins; March 19, 2025
Impossible Videos
15 mins; March 19, 2025
[QA] SuperBPE: Space Travel for Language Models
7 mins; March 17, 2025
SuperBPE: Space Travel for Language Models
16 mins; March 17, 2025
[QA] xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
7 mins; March 17, 2025
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
18 mins; March 17, 2025
[QA] PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
7 mins; March 17, 2025
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
19 mins; March 17, 2025
[QA] Auditing language models for hidden objectives
8 mins; March 17, 2025
Auditing language models for hidden objectives
37 mins; March 17, 2025
[QA] Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space
8 mins; March 15, 2025
Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space
20 mins; March 15, 2025
[QA] New Trends for Modern Machine Translation with Large Reasoning Models
7 mins; March 15, 2025
New Trends for Modern Machine Translation with Large Reasoning Models
15 mins; March 15, 2025
[QA] Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
8 mins; March 14, 2025
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
17 mins; March 14, 2025
[QA] Long Context Tuning for Video Generation
8 mins; March 14, 2025
Long Context Tuning for Video Generation
15 mins; March 14, 2025
[QA] Transformers without Normalization
7 mins; March 14, 2025
Transformers without Normalization
12 mins; March 14, 2025
[QA] Charting and Navigating Hugging Face's Model Atlas
7 mins; March 14, 2025
Charting and Navigating Hugging Face's Model Atlas
13 mins; March 14, 2025
[QA] I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
8 mins; March 13, 2025
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
12 mins; March 13, 2025
[QA] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
7 mins; March 13, 2025
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
18 mins; March 13, 2025
[QA] Gemini Embedding: Generalizable Embeddings from Gemini
8 mins; March 12, 2025
Gemini Embedding: Generalizable Embeddings from Gemini
16 mins; March 12, 2025
[QA] Inductive Moment Matching
9 mins; March 11, 2025
Inductive Moment Matching
23 mins; March 11, 2025
[QA] Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
7 mins; March 11, 2025
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
30 mins; March 11, 2025
[QA] Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
7 mins; March 09, 2025
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
19 mins; March 09, 2025
[QA] Continual Pre-training of MoEs: How robust is your router?
8 mins; March 09, 2025
Continual Pre-training of MoEs: How robust is your router?
23 mins; March 09, 2025
[QA] HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs
6 mins; March 08, 2025
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs
13 mins; March 08, 2025
[QA] Boosting Blockchain Throughput: Parallel EVM Execution with Asynchronous Storage for Reddio
7 mins; March 08, 2025
Boosting Blockchain Throughput: Parallel EVM Execution with Asynchronous Storage for Reddio
26 mins; March 08, 2025