Accelerating LLMs with TornadoVM: From GPU Kernels to Model Inference

Accelerating LLMs with TornadoVM: From GPU Kernels to Model Inference

A conversation with Juan Fumero about TornadoVM, parallelization, SIMD and LLama
1 Stunde 11 Minuten
Podcast
Podcaster
Java, Serverless, Clouds, Architecture and Web conversations with Adam Bien

Beschreibung

vor 8 Monaten
An airhacks.fm conversation with Juan Fumero (@snatverk) about:
tornadovm as a Java parallel framework for accelerating data
parallelization on GPUs and other hardware, first GPU experiences
with ELSA Winner and Voodoo cards, explanation of TornadoVM as a
plugin to existing JDKs that uses Graal as a library, TornadoVM's
programming model with @parallel and @reduce annotations for
parallelizable code, introduction of kernel API for lower-level GPU
programming, TornadoVM's ability to dynamically reconfigure and
select the best hardware for workloads, implementation of LLM
inference acceleration with TornadoVM, challenges in accelerating
Llama models on GPUs, introduction of tensor types in TornadoVM to
support FP8 and FP16 operations, shared buffer capabilities for GPU
memory management, comparison of Java Vector API performance versus
GPU acceleration, discussion of model quantization as a potential
use case for TornadoVM, exploration of Deep Java Library (DJL) and
its ND array implementation, potential standardization of tensor
types in Java, integration possibilities with Project Babylon and
its Code Reflection capabilities, TornadoVM's execution plans and
task graphs for defining accelerated workloads, ability to run on
multiple GPUs with different backends simultaneously, potential
enterprise applications for LLMs in Java including model
distillation for domain-specific models, discussion of Foreign
Function & Memory API integration in TornadoVM, performance
comparison between different GPU backends like OpenCL and CUDA,
collaboration with Intel Level Zero oneAPI and integrated graphics
support, future plans for RISC-V support in TornadoVM

Juan Fumero on twitter: @snatverk

Weitere Episoden

Not Your Java Package Handler
1 Stunde 12 Minuten
vor 7 Monaten
From Punch Cards (and Tapes) to Java
1 Stunde 6 Minuten
vor 7 Monaten
Injection Without Reflection
57 Minuten
vor 8 Monaten
About Amazon Corretto
1 Stunde 5 Minuten
vor 8 Monaten

Kommentare (0)

Lade Inhalte...

Abonnenten

15
15