Running out of time to catch up with new arXiv papers? We take the most impactful papers and present them as convenient podcasts. If you're a visual learner, we offer these papers in an engaging video format. Our service fills the gap between overly brief paper summaries and time-consuming full paper reads. You gain academic insights in a time-efficient, digestible format. Code behind this work: https://github.com/imelnyk/ArxivPapers
[QA] Introducing Milabench: Benchmarking Accelerators for AI
7 mins; November 20, 2024
Introducing Milabench: Benchmarking Accelerators for AI
26 mins; November 20, 2024
[QA] Does Prompt Formatting Have Any Impact on LLM Performance?
6 mins; November 18, 2024
Does Prompt Formatting Have Any Impact on LLM Performance?
9 mins; November 18, 2024
[QA] Steering Language Model Refusal with Sparse Autoencoders
7 mins; November 18, 2024
Steering Language Model Refusal with Sparse Autoencoders
24 mins; November 18, 2024
[QA] LLaVA-o1: Let Vision Language Models Reason Step-by-Step
7 mins; November 17, 2024
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
17 mins; November 17, 2024
[QA] Refusal in LLMs is an Affine Function
7 mins; November 15, 2024
Refusal in LLMs is an Affine Function
8 mins; November 15, 2024
[QA] Cut Your Losses in Large-Vocabulary Language Models
8 mins; November 15, 2024
Cut Your Losses in Large-Vocabulary Language Models
17 mins; November 15, 2024
[QA] Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
6 mins; November 12, 2024
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
19 mins; November 12, 2024
[QA] Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models
7 mins; November 12, 2024
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models
16 mins; November 12, 2024
[QA] Aioli: A unified optimization framework for language model data mixing
6 mins; November 11, 2024
Aioli: A unified optimization framework for language model data mixing
29 mins; November 11, 2024
[QA] BALANCING PIPELINE PARALLELISM WITH VOCABULARY PARALLELISM
8 mins; November 11, 2024
BALANCING PIPELINE PARALLELISM WITH VOCABULARY PARALLELISM
20 mins; November 11, 2024
[QA] Can Transformers Smell Like Humans?
7 mins; November 10, 2024
Can Transformers Smell Like Humans?
18 mins; November 10, 2024
[QA] Mixtures of In-Context Learners
7 mins; November 10, 2024
Mixtures of In-Context Learners
15 mins; November 10, 2024
[QA] How Far Is Video Generation from World Model: A Physical Law Perspective
8 mins; November 09, 2024
How Far Is Video Generation from World Model: A Physical Law Perspective
27 mins; November 09, 2024
[QA] ADOPT: Modified Adam Can Converge with Any with the Optimal Rate
7 mins; November 09, 2024
ADOPT: Modified Adam Can Converge with Any with the Optimal Rate
15 mins; November 09, 2024
[QA] Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
7 mins; November 08, 2024
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
14 mins; November 08, 2024
[QA] Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
7 mins; November 08, 2024
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
41 mins; November 08, 2024
[QA] Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex
10 mins; November 06, 2024
Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex
15 mins; November 06, 2024
[QA] How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis
7 mins; November 06, 2024
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis
22 mins; November 06, 2024
[QA] Discovering Data Structures: Nearest Neighbor Search and Beyond
7 mins; November 06, 2024
Discovering Data Structures: Nearest Neighbor Search and Beyond
28 mins; November 06, 2024
[QA] BrainBits: How Much of the Brain are Generative Reconstruction Methods Using?
7 mins; November 06, 2024
BrainBits: How Much of the Brain are Generative Reconstruction Methods Using?
15 mins; November 06, 2024
[QA] Adapting Language Models via Token Translation
8 mins; November 03, 2024
Adapting Language Models via Token Translation
9 mins; November 03, 2024
[QA] Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
8 mins; November 03, 2024
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
26 mins; November 03, 2024
[QA] Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters
7 mins; November 02, 2024
Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters
19 mins; November 02, 2024
[QA] $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
7 mins; November 02, 2024
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
16 mins; November 02, 2024
[QA] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
7 mins; November 01, 2024
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
15 mins; November 01, 2024
[QA] Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters
7 mins; October 31, 2024
Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters
19 mins; October 31, 2024
[QA] Where Do Large Learning Rates Lead Us?
8 mins; October 30, 2024
Where Do Large Learning Rates Lead Us?
28 mins; October 30, 2024
[QA] Fourier Head: Helping Large Language Models Learn Complex Probability Distributions
7 mins; October 30, 2024
Fourier Head: Helping Large Language Models Learn Complex Probability Distributions
13 mins; October 30, 2024
[QA] LoRA vs Full Fine-tuning: An Illusion of Equivalence
7 mins; October 28, 2024
LoRA vs Full Fine-tuning: An Illusion of Equivalence
13 mins; October 28, 2024
[QA] Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
6 mins; October 27, 2024
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
8 mins; October 27, 2024
[QA] Computational Bottlenecks of Training Small-scale Large Language Models
8 mins; October 27, 2024
Computational Bottlenecks of Training Small-scale Large Language Models
9 mins; October 27, 2024
[QA] Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence Guarantees
9 mins; October 25, 2024
Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence Guarantees
19 mins; October 25, 2024
[QA] A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
8 mins; October 25, 2024
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
18 mins; October 25, 2024
[QA] LEGO: Language Model Building Blocks
7 mins; October 24, 2024
LEGO: Language Model Building Blocks
16 mins; October 24, 2024
[QA] Knowledge Distillation Using Frontier Open-Source LLMs: Generalizability and the Role of Synthetic Data
8 mins; October 24, 2024
Knowledge Distillation Using Frontier Open-Source LLMs: Generalizability and the Role of Synthetic Data
19 mins; October 24, 2024
[QA] Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers
8 mins; October 23, 2024
Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers
20 mins; October 23, 2024
[QA] ALTA: Compiler-Based Analysis of Transformers
7 mins; October 23, 2024
ALTA: Compiler-Based Analysis of Transformers
22 mins; October 23, 2024
[QA] UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs
8 mins; October 22, 2024
UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs
16 mins; October 22, 2024
[QA] Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
7 mins; October 22, 2024
Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
18 mins; October 22, 2024
[QA] Generative Reward Models
7 mins; October 21, 2024
Generative Reward Models
12 mins; October 21, 2024
[QA] Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks
8 mins; October 20, 2024
Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks
10 mins; October 20, 2024
[QA] Decomposing The Dark Matter of Sparse Autoencoders
7 mins; October 20, 2024
Decomposing The Dark Matter of Sparse Autoencoders
15 mins; October 20, 2024
[QA] A Hitchhiker's Guide to Scaling Law Estimation
10 mins; October 19, 2024
A Hitchhiker's Guide to Scaling Law Estimation
17 mins; October 19, 2024
[QA] Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
7 mins; October 19, 2024
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
10 mins; October 19, 2024
[QA] Looking Inward: Language Models Can Learn About Themselves by Introspection
7 mins; October 18, 2024
Looking Inward: Language Models Can Learn About Themselves by Introspection
26 mins; October 18, 2024
[QA] Thinking LLMs: General Instruction Following with Thought Generation
7 mins; October 18, 2024
Thinking LLMs: General Instruction Following with Thought Generation
15 mins; October 18, 2024
[QA] Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
7 mins; October 18, 2024
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
17 mins; October 18, 2024
[QA] MOVIE GEN: A Cast of Media Foundation Models
8 mins; October 18, 2024
MOVIE GEN: A Cast of Media Foundation Models
113 hours 6 mins; October 18, 2024
[QA] One Step Diffusion via Shortcut Models
8 mins; October 16, 2024
One Step Diffusion via Shortcut Models
17 mins; October 16, 2024
[QA] Inference Scaling for Long-Context Retrieval Augmented Generation
7 mins; October 16, 2024
Inference Scaling for Long-Context Retrieval Augmented Generation