Arxiv Papers
Podcast image
[QA] Grokking at the Edge of Numerical Stability
7 mins; January 08, 2025
Grokking at the Edge of Numerical Stability
16 mins; January 08, 2025
[QA] ComMer: a Framework for Compressing and Merging User Data for Personalization
7 mins; January 07, 2025
ComMer: a Framework for Compressing and Merging User Data for Personalization
16 mins; January 07, 2025
[QA] Entropy-Guided Attention for Private LLMs
7 mins; January 07, 2025
Entropy-Guided Attention for Private LLMs
13 mins; January 07, 2025
[QA] Easing Optimization Paths: a Circuit Perspective
7 mins; January 06, 2025
Easing Optimization Paths: a Circuit Perspective
9 mins; January 06, 2025
[QA] Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs
7 mins; January 06, 2025
Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs
16 mins; January 06, 2025
[QA] Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
6 mins; January 05, 2025
Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
7 mins; January 05, 2025
[QA] Predicting the Performance of Black-box LLMs through Self-Queries
7 mins; January 05, 2025
Predicting the Performance of Black-box LLMs through Self-Queries
20 mins; January 05, 2025
[QA] On Unifying Video Generation and Camera Pose Estimation
8 mins; January 03, 2025
On Unifying Video Generation and Camera Pose Estimation
21 mins; January 03, 2025
[QA] An analytic theory of creativity in convolutional diffusion models
8 mins; January 03, 2025
An analytic theory of creativity in convolutional diffusion models
22 mins; January 03, 2025
[QA] Finding Missed Code Size Optimizations in Compilers using LLMs
7 mins; January 03, 2025
Finding Missed Code Size Optimizations in Compilers using LLMs
18 mins; January 03, 2025
[QA] Titans: Learning to Memorize at Test Time
7 mins; January 03, 2025
Titans: Learning to Memorize at Test Time
30 mins; January 03, 2025
[QA] Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
7 mins; December 30, 2024
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
20 mins; December 30, 2024
[QA] Functional Risk Minimization
8 mins; December 30, 2024
Functional Risk Minimization
25 mins; December 30, 2024
[QA] HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
8 mins; December 30, 2024
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
17 mins; December 30, 2024
InfAlign: Inference-aware language model alignment
27 mins; December 30, 2024
[QA] Consistency Checks for Language Model Forecasters
7 mins; December 25, 2024
Consistency Checks for Language Model Forecasters
18 mins; December 25, 2024
[QA] Deliberation in Latent Space via Differentiable Cache Augmentation
7 mins; December 24, 2024
Deliberation in Latent Space via Differentiable Cache Augmentation
20 mins; December 24, 2024
[QA] Automating the Search for Artificial Life with Foundation Models
7 mins; December 24, 2024
Automating the Search for Artificial Life with Foundation Models
20 mins; December 24, 2024
[QA] LLMs for Literature Review: Are we there yet?
6 mins; December 22, 2024
LLMs for Literature Review: Are we there yet?
5 mins; December 22, 2024
[QA] WebLLM: A High-Performance In-Browser LLM Inference Engine
6 mins; December 22, 2024
WebLLM: A High-Performance In-Browser LLM Inference Engine
7 mins; December 22, 2024
[QA] A Survey on LLM Inference-Time Self-Improvement
7 mins; December 21, 2024
A Survey on LLM Inference-Time Self-Improvement
23 mins; December 21, 2024
[QA] Tokenisation is NP-Complete
7 mins; December 21, 2024
Tokenisation is NP-Complete
17 mins; December 21, 2024
[QA] The Open-Source Advantage in Large Language Models (LLMs)
8 mins; December 21, 2024
The Open-Source Advantage in Large Language Models (LLMs)
13 mins; December 21, 2024
[QA] Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
7 mins; December 21, 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
20 mins; December 21, 2024
[QA] DriveGPT: Scaling Autoregressive Behavior Models for Driving
8 mins; December 20, 2024
DriveGPT: Scaling Autoregressive Behavior Models for Driving
19 mins; December 20, 2024
[QA] MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
8 mins; December 20, 2024
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
16 mins; December 20, 2024
[QA] Byte Latent Transformer: Patches Scale Better Than Tokens
7 mins; December 17, 2024
Byte Latent Transformer: Patches Scale Better Than Tokens
40 mins; December 17, 2024
[QA] Transformers Struggle to Learn to Search
7 mins; December 08, 2024
Transformers Struggle to Learn to Search
19 mins; December 08, 2024
[QA] Navigation World Models
7 mins; December 07, 2024
Navigation World Models
19 mins; December 07, 2024
[QA] Motion Prompting: Controlling Video Generation with Motion Trajectories
7 mins; December 07, 2024
Motion Prompting: Controlling Video Generation with Motion Trajectories
16 mins; December 07, 2024
[QA] Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
8 mins; December 06, 2024
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
21 mins; December 06, 2024
[QA] NVILA: Efficient Frontier Visual Language Models
7 mins; December 06, 2024
NVILA: Efficient Frontier Visual Language Models
23 mins; December 06, 2024
[QA] o1-Coder: an o1 Replication for Coding
9 mins; December 03, 2024
o1-Coder: an o1 Replication for Coding
19 mins; December 03, 2024
[QA] Efficient Track Anything
7 mins; December 03, 2024
Efficient Track Anything
19 mins; December 03, 2024
[QA] Reverse Thinking Makes LLMs Stronger Reasoners
8 mins; December 01, 2024
Reverse Thinking Makes LLMs Stronger Reasoners
16 mins; December 01, 2024
[QA] Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM’s Reasoning Capability
7 mins; December 01, 2024
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM’s Reasoning Capability
12 mins; December 01, 2024
[QA] JetFormer: an autoregressive generative model of raw images and text
7 mins; December 01, 2024
JetFormer: an autoregressive generative model of raw images and text
23 mins; December 01, 2024
[QA] CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
7 mins; November 29, 2024
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
15 mins; November 29, 2024
Attamba: Attending To Multi-Token States
15 mins; November 28, 2024
[QA] Star Attention: Efficient LLM Inference over Long Sequences
7 mins; November 28, 2024
Star Attention: Efficient LLM Inference over Long Sequences
15 mins; November 28, 2024
[QA] ROICtrl: Boosting Instance Control for Visual Generation
8 mins; November 28, 2024
ROICtrl: Boosting Instance Control for Visual Generation
20 mins; November 28, 2024
[QA] Solaris: A Foundation Model of the Sun
7 mins; November 26, 2024
Solaris: A Foundation Model of the Sun
16 mins; November 26, 2024
[QA] Even Sparser Graph Transformers
7 mins; November 26, 2024
Even Sparser Graph Transformers
21 mins; November 26, 2024
[QA] Understanding LLM Embeddings for Regression
6 mins; November 24, 2024
Understanding LLM Embeddings for Regression
12 mins; November 24, 2024
[QA] Loss-to-Loss Prediction: Scaling Laws for All Datasets
7 mins; November 24, 2024
Loss-to-Loss Prediction: Scaling Laws for All Datasets
21 mins; November 24, 2024
[QA] When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
7 mins; November 22, 2024
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
21 mins; November 22, 2024
[QA] SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
7 mins; November 22, 2024
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
18 mins; November 22, 2024
[QA] Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
7 mins; November 22, 2024
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
13 mins; November 22, 2024
[QA] Hymba: A Hybrid-head Architecture for Small Language Models
8 mins; November 22, 2024
Hymba: A Hybrid-head Architecture for Small Language Models
21 mins; November 22, 2024
[QA] Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
7 mins; November 21, 2024
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
14 mins; November 21, 2024
[QA] Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
7 mins; November 20, 2024
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
23 mins; November 20, 2024