Want to create an interactive transcript for this episode?
Podcast: airhacks.fm podcast with adam bien
Episode: Accelerating LLMs with TornadoVM: From GPU Kernels to Model Inference
Description: An airhacks.fm conversation with Juan Fumero (@snatverk) about:
tornadovm as a Java parallel framework for accelerating data parallelization on GPUs and other hardware,
first GPU experiences with ELSA Winner and Voodoo cards,
explanation of TornadoVM as a plugin to existing JDKs that uses Graal as a library,
TornadoVM's programming model with @parallel and @reduce annotations for parallelizable code,
introduction of kernel API for lower-level GPU programming,
TornadoVM's ability to dynamically reconfigure and select the best hardware for workloads,
implementation of LLM inference acceleration with TornadoVM,
challenges in accelerating Llama models on GPUs,
introduction of tensor types in TornadoVM to...