LanguaTalk

Arxiv Papers

Running out of time to catch up with new arXiv papers? We take the most impactful papers and present them as convenient podcasts. If you're a visual learner, we offer these papers in an engaging video format. Our service fills the gap between overly brief paper summaries and time-consuming full paper reads. You gain academic insights in a time-efficient, digestible format. Code behind this work: https://github.com/imelnyk/ArxivPapers

[QA] Introducing Milabench: Benchmarking Accelerators for AI

7 mins; November 20, 2024

Introducing Milabench: Benchmarking Accelerators for AI

26 mins; November 20, 2024

[QA] Does Prompt Formatting Have Any Impact on LLM Performance?

6 mins; November 18, 2024

Does Prompt Formatting Have Any Impact on LLM Performance?

9 mins; November 18, 2024

[QA] Steering Language Model Refusal with Sparse Autoencoders

7 mins; November 18, 2024

Steering Language Model Refusal with Sparse Autoencoders

24 mins; November 18, 2024

[QA] LLaVA-o1: Let Vision Language Models Reason Step-by-Step

7 mins; November 17, 2024

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

17 mins; November 17, 2024

[QA] Refusal in LLMs is an Affine Function

7 mins; November 15, 2024

Refusal in LLMs is an Affine Function

8 mins; November 15, 2024

[QA] Cut Your Losses in Large-Vocabulary Language Models

8 mins; November 15, 2024

Cut Your Losses in Large-Vocabulary Language Models

17 mins; November 15, 2024

[QA] Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

6 mins; November 12, 2024

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

19 mins; November 12, 2024

[QA] Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models

7 mins; November 12, 2024

Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models

16 mins; November 12, 2024

[QA] Aioli: A unified optimization framework for language model data mixing

6 mins; November 11, 2024

Aioli: A unified optimization framework for language model data mixing

29 mins; November 11, 2024

[QA] BALANCING PIPELINE PARALLELISM WITH VOCABULARY PARALLELISM

8 mins; November 11, 2024

BALANCING PIPELINE PARALLELISM WITH VOCABULARY PARALLELISM

20 mins; November 11, 2024

[QA] Can Transformers Smell Like Humans?

7 mins; November 10, 2024

Can Transformers Smell Like Humans?

18 mins; November 10, 2024

[QA] Mixtures of In-Context Learners

7 mins; November 10, 2024

Mixtures of In-Context Learners

15 mins; November 10, 2024

[QA] How Far Is Video Generation from World Model: A Physical Law Perspective

8 mins; November 09, 2024

How Far Is Video Generation from World Model: A Physical Law Perspective

27 mins; November 09, 2024

[QA] ADOPT: Modified Adam Can Converge with Any with the Optimal Rate

7 mins; November 09, 2024

ADOPT: Modified Adam Can Converge with Any with the Optimal Rate

15 mins; November 09, 2024

[QA] Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?

7 mins; November 08, 2024

Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?

14 mins; November 08, 2024

[QA] Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

7 mins; November 08, 2024

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

41 mins; November 08, 2024

[QA] Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex

10 mins; November 06, 2024

Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex

15 mins; November 06, 2024

[QA] How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis

7 mins; November 06, 2024

How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis

22 mins; November 06, 2024

[QA] Discovering Data Structures: Nearest Neighbor Search and Beyond

7 mins; November 06, 2024

Discovering Data Structures: Nearest Neighbor Search and Beyond

28 mins; November 06, 2024

[QA] BrainBits: How Much of the Brain are Generative Reconstruction Methods Using?

7 mins; November 06, 2024

BrainBits: How Much of the Brain are Generative Reconstruction Methods Using?

15 mins; November 06, 2024

[QA] Adapting Language Models via Token Translation

8 mins; November 03, 2024

Adapting Language Models via Token Translation

9 mins; November 03, 2024

[QA] Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models

8 mins; November 03, 2024

Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models

26 mins; November 03, 2024

[QA] Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters

7 mins; November 02, 2024

Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters

19 mins; November 02, 2024

[QA] $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources

7 mins; November 02, 2024

$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources

16 mins; November 02, 2024

[QA] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

7 mins; November 01, 2024

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

15 mins; November 01, 2024

[QA] Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters

7 mins; October 31, 2024

Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters

19 mins; October 31, 2024

[QA] Where Do Large Learning Rates Lead Us?

8 mins; October 30, 2024

Where Do Large Learning Rates Lead Us?

28 mins; October 30, 2024

[QA] Fourier Head: Helping Large Language Models Learn Complex Probability Distributions

7 mins; October 30, 2024

Fourier Head: Helping Large Language Models Learn Complex Probability Distributions

13 mins; October 30, 2024

[QA] LoRA vs Full Fine-tuning: An Illusion of Equivalence

7 mins; October 28, 2024

LoRA vs Full Fine-tuning: An Illusion of Equivalence

13 mins; October 28, 2024

[QA] Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

6 mins; October 27, 2024

Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

8 mins; October 27, 2024

[QA] Computational Bottlenecks of Training Small-scale Large Language Models

8 mins; October 27, 2024

Computational Bottlenecks of Training Small-scale Large Language Models

9 mins; October 27, 2024

[QA] Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence Guarantees

9 mins; October 25, 2024

Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence Guarantees

19 mins; October 25, 2024

[QA] A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration

8 mins; October 25, 2024

A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration

18 mins; October 25, 2024

[QA] LEGO: Language Model Building Blocks

7 mins; October 24, 2024

LEGO: Language Model Building Blocks

16 mins; October 24, 2024

[QA] Knowledge Distillation Using Frontier Open-Source LLMs: Generalizability and the Role of Synthetic Data

8 mins; October 24, 2024

Knowledge Distillation Using Frontier Open-Source LLMs: Generalizability and the Role of Synthetic Data

19 mins; October 24, 2024

[QA] Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers

8 mins; October 23, 2024

Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers

20 mins; October 23, 2024

[QA] ALTA: Compiler-Based Analysis of Transformers

7 mins; October 23, 2024

ALTA: Compiler-Based Analysis of Transformers

22 mins; October 23, 2024

[QA] UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs

8 mins; October 22, 2024

UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs

16 mins; October 22, 2024

[QA] Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing

7 mins; October 22, 2024

Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing

18 mins; October 22, 2024

[QA] Generative Reward Models

7 mins; October 21, 2024

Generative Reward Models

12 mins; October 21, 2024

[QA] Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks

8 mins; October 20, 2024

Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks

10 mins; October 20, 2024

[QA] Decomposing The Dark Matter of Sparse Autoencoders

7 mins; October 20, 2024

Decomposing The Dark Matter of Sparse Autoencoders

15 mins; October 20, 2024

[QA] A Hitchhiker's Guide to Scaling Law Estimation

10 mins; October 19, 2024

A Hitchhiker's Guide to Scaling Law Estimation

17 mins; October 19, 2024

[QA] Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

7 mins; October 19, 2024

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

10 mins; October 19, 2024

[QA] Looking Inward: Language Models Can Learn About Themselves by Introspection

7 mins; October 18, 2024

Looking Inward: Language Models Can Learn About Themselves by Introspection

26 mins; October 18, 2024

[QA] Thinking LLMs: General Instruction Following with Thought Generation

7 mins; October 18, 2024

Thinking LLMs: General Instruction Following with Thought Generation

15 mins; October 18, 2024

[QA] Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs

7 mins; October 18, 2024

Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs

17 mins; October 18, 2024

[QA] MOVIE GEN: A Cast of Media Foundation Models

8 mins; October 18, 2024

MOVIE GEN: A Cast of Media Foundation Models

113 hours 6 mins; October 18, 2024

[QA] One Step Diffusion via Shortcut Models

8 mins; October 16, 2024

One Step Diffusion via Shortcut Models

17 mins; October 16, 2024

[QA] Inference Scaling for Long-Context Retrieval Augmented Generation

7 mins; October 16, 2024

Inference Scaling for Long-Context Retrieval Augmented Generation

22 mins; October 16, 2024

Click here to see more

Arxiv Papers

Useful pages

Find a tutor

Languages