
TornadoVM
TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
Stars: 1217

TornadoVM is a plug-in to OpenJDK and GraalVM that allows programmers to automatically run Java programs on heterogeneous hardware. TornadoVM targets OpenCL, PTX and SPIR-V compatible devices which include multi-core CPUs, dedicated GPUs (Intel, NVIDIA, AMD), integrated GPUs (Intel HD Graphics and ARM Mali), and FPGAs (Intel and Xilinx).
README:
TornadoVM is a plug-in to OpenJDK and GraalVM that allows programmers to automatically run Java programs on heterogeneous hardware. TornadoVM targets OpenCL, PTX and SPIR-V compatible devices which include multi-core CPUs, dedicated GPUs (Intel, NVIDIA, AMD), integrated GPUs (Intel HD Graphics and ARM Mali), and FPGAs (Intel and Xilinx).
TornadoVM has three backends that generate OpenCL C, NVIDIA CUDA PTX assembly, and SPIR-V binary. Developers can choose which backends to install and run.
Website: tornadovm.org
Documentation: https://tornadovm.readthedocs.io/en/latest/
For a quick introduction please read the following FAQ.
Latest Release: TornadoVM 1.0.10 - 31/01/2025 : See CHANGELOG.
In Linux and macOS, TornadoVM can be installed automatically with the installation script. For example:
$ ./bin/tornadovm-installer
usage: tornadovm-installer [-h] [--version] [--jdk JDK] [--backend BACKEND] [--listJDKs] [--javaHome JAVAHOME]
TornadoVM Installer Tool. It will install all software dependencies except the GPU/FPGA drivers
optional arguments:
-h, --help show this help message and exit
--version Print version of TornadoVM
--jdk JDK Select one of the supported JDKs. Use --listJDKs option to see all supported ones.
--backend BACKEND Select the backend to install: { opencl, ptx, spirv }
--listJDKs List all JDK supported versions
--javaHome JAVAHOME Use a JDK from a user directory
NOTE Select the desired backend:
-
opencl
: Enables the OpenCL backend (requires OpenCL drivers) -
ptx
: Enables the PTX backend (requires NVIDIA CUDA drivers) -
spirv
: Enables the SPIRV backend (requires Intel Level Zero drivers)
Example of installation:
# Install the OpenCL backend with OpenJDK 21
$ ./bin/tornadovm-installer --jdk jdk21 --backend opencl
# It is also possible to combine different backends:
$ ./bin/tornadovm-installer --jdk jdk21 --backend opencl,spirv,ptx
Alternatively, TornadoVM can be installed either manually from source or by using Docker.
If you are planning to use Docker with TornadoVM on GPUs, you can also follow these guidelines.
You can also run TornadoVM on Amazon AWS CPUs, GPUs, and FPGAs following the instructions here.
TornadoVM is currently being used to accelerate machine learning and deep learning applications, computer vision, physics simulations, financial applications, computational photography, and signal processing.
Featured use-cases:
- kfusion-tornadovm: Java application for accelerating a computer-vision application using the Tornado-APIs to run on discrete and integrated GPUs.
- Java Ray-Tracer: Java application accelerated with TornadoVM for real-time ray-tracing.
We also have a set of examples that includes NBody, DFT, KMeans computation and matrix computations.
Additional Information
- General Documentation
- Benchmarks
- How TornadoVM executes reductions
- Execution Flags
- FPGA execution
- Profiler Usage
TornadoVM exposes to the programmer task-level, data-level and pipeline-level parallelism via a light Application Programming Interface (API). In addition, TornadoVM uses single-source property, in which the code to be accelerated and the host code live in the same Java program.
Compute-kernels in TornadoVM can be programmed using two different approaches (APIs):
Compute kernels are written in a sequential form (tasks programmed for a single thread execution). To express
parallelism, TornadoVM exposes two annotations that can be used in loops and parameters: a) @Parallel
for annotating
parallel loops; and b) @Reduce
for annotating parameters used in reductions.
The following code snippet shows a full example to accelerate Matrix-Multiplication using TornadoVM and the loop-parallel API:
public class Compute {
private static void mxmLoop(Matrix2DFloat A, Matrix2DFloat B, Matrix2DFloat C, final int size) {
for (@Parallel int i = 0; i < size; i++) {
for (@Parallel int j = 0; j < size; j++) {
float sum = 0.0f;
for (int k = 0; k < size; k++) {
sum += A.get(i, k) * B.get(k, j);
}
C.set(i, j, sum);
}
}
}
public void run(Matrix2DFloat A, Matrix2DFloat B, Matrix2DFloat C, final int size) {
// Create a task-graph with multiple tasks. Each task points to an exising Java method
// that can be accelerated on a GPU/FPGA
TaskGraph taskGraph = new TaskGraph("myCompute")
.transferToDevice(DataTransferMode.FIRST_EXECUTION, A, B) // Transfer data from host to device only in the first execution
.task("mxm", Compute::mxmLoop, A, B, C, size) // Each task points to an existing Java method
.transferToHost(DataTransferMode.EVERY_EXECUTION, C); // Transfer data from device to host
// Create an immutable task-graph
ImmutableTaskGraph immutableTaskGraph = taskGraph.snaphot();
// Create an execution plan from an immutable task-graph
try (TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph)) {
// Run the execution plan on the default device
TorandoExecutionResult executionResult = executionPlan.execute();
} catch (TornadoExecutionPlanException e) {
// handle exception
// ...
}
}
}
Another way to express compute-kernels in TornadoVM is via the Kernel API.
To do so, TornadoVM exposes the KernelContext
data structure, in which the application can directly access the thread-id, allocate
memory in local memory (shared memory on NVIDIA devices), and insert barriers.
This model is similar to programming compute-kernels in SYCL, oneAPI, OpenCL and CUDA.
Therefore, this API is more suitable for GPU/FPGA expert programmers that want more control or want to port existing
CUDA/OpenCL compute kernels into TornadoVM.
The following code-snippet shows the Matrix Multiplication example using the kernel-parallel API:
public class Compute {
private static void mxmKernel(KernelContext context, Matrix2DFloat A, Matrix2DFloat B, Matrix2DFloat C, final int size) {
int idx = context.globalIdx
int jdx = context.globalIdy;
float sum = 0;
for (int k = 0; k < size; k++) {
sum += A.get(idx, k) * B.get(k, jdx);
}
C.set(idx, jdx, sum);
}
public void run(Matrix2DFloat A, Matrix2DFloat B, Matrix2DFloat C, final int size) {
// When using the kernel-parallel API, we need to create a Grid and a Worker
WorkerGrid workerGrid = new WorkerGrid2D(size, size); // Create a 2D Worker
GridScheduler gridScheduler = new GridScheduler("myCompute.mxm", workerGrid); // Attach the worker to the Grid
KernelContext context = new KernelContext(); // Create a context
workerGrid.setLocalWork(16, 16, 1); // Set the local-group size
TaskGraph taskGraph = new TaskGraph("myCompute")
.transferToDevice(DataTransferMode.FIRST_EXECUTION, A, B) // Transfer data from host to device only in the first execution
.task("mxm", Compute::mxmKernel, context, A, B, C, size) // Each task points to an existing Java method
.transferToHost(DataTransferMode.EVERY_EXECUTION, C); // Transfer data from device to host
// Create an immutable task-graph
ImmutableTaskGraph immutableTaskGraph = taskGraph.snapshot();
// Create an execution plan from an immutable task-graph
try (TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph)) {
// Run the execution plan on the default device
// Execute the execution plan
TorandoExecutionResult executionResult = executionPlan
.withGridScheduler(gridScheduler)
.execute();
} catch (TornadoExecutionPlanException e) {
// handle exception
// ...
}
}
}
Additionally, the two modes of expressing parallelism (kernel and loop parallelization) can be combined in the same task graph object.
Dynamic reconfiguration is the ability of TornadoVM to perform live task migration between devices, which means that TornadoVM decides where to execute the code to increase performance (if possible). In other words, TornadoVM switches devices if it can detect that a specific device can yield better performance (compared to another).
With the task-migration, the TornadoVM's approach is to only switch device if it detects an application can be executed faster than the CPU execution using the code compiled by C2 or Graal-JIT, otherwise it will stay on the CPU. So TornadoVM can be seen as a complement to C2 and Graal JIT compilers. This is because there is no single hardware to best execute all workloads efficiently. GPUs are very good at exploiting SIMD applications, and FPGAs are very good at exploiting pipeline applications. If your applications follow those models, TornadoVM will likely select heterogeneous hardware. Otherwise, it will stay on the CPU using the default compilers (C2 or Graal).
To use the dynamic reconfiguration, you can execute using TornadoVM policies. For example:
// TornadoVM will execute the code in the best accelerator.
executionPlan.withDynamicReconfiguration(Policy.PERFORMANCE, DRMode.PARALLEL)
.execute();
Further details and instructions on how to enable this feature can be found here.
- Dynamic reconfiguration: https://dl.acm.org/doi/10.1145/3313808.3313819
To use TornadoVM, you need two components:
a) The TornadoVM jar
file with the API. The API is licensed as GPLV2 with Classpath Exception.
b) The core libraries of TornadoVM along with the dynamic library for the driver code (.so
files for OpenCL, PTX
and/or SPIRV/Level Zero).
You can import the TornadoVM API by setting this the following dependency in the Maven pom.xml
file:
<repositories>
<repository>
<id>universityOfManchester-graal</id>
<url>https://raw.githubusercontent.com/beehive-lab/tornado/maven-tornadovm</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>tornado</groupId>
<artifactId>tornado-api</artifactId>
<version>1.0.10</version>
</dependency>
<dependency>
<groupId>tornado</groupId>
<artifactId>tornado-matrices</artifactId>
<version>1.0.10</version>
</dependency>
</dependencies>
To run TornadoVM, you need to either install the TornadoVM extension for GraalVM/OpenJDK, or run with our Docker images.
Here you can find videos, presentations, tech-articles and artefacts describing TornadoVM, and how to use it.
If you are using TornadoVM >= 0.2 (which includes the Dynamic Reconfiguration, the initial FPGA support and CPU/GPU reductions), please use the following citation:
@inproceedings{Fumero:DARHH:VEE:2019,
author = {Fumero, Juan and Papadimitriou, Michail and Zakkak, Foivos S. and Xekalaki, Maria and Clarkson, James and Kotselidis, Christos},
title = {{Dynamic Application Reconfiguration on Heterogeneous Hardware.}},
booktitle = {Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments},
series = {VEE '19},
year = {2019},
doi = {10.1145/3313808.3313819},
publisher = {Association for Computing Machinery}
}
If you are using Tornado 0.1 (Initial release), please use the following citation in your work.
@inproceedings{Clarkson:2018:EHH:3237009.3237016,
author = {Clarkson, James and Fumero, Juan and Papadimitriou, Michail and Zakkak, Foivos S. and Xekalaki, Maria and Kotselidis, Christos and Luj\'{a}n, Mikel},
title = {{Exploiting High-performance Heterogeneous Hardware for Java Programs Using Graal}},
booktitle = {Proceedings of the 15th International Conference on Managed Languages \& Runtimes},
series = {ManLang '18},
year = {2018},
isbn = {978-1-4503-6424-9},
location = {Linz, Austria},
pages = {4:1--4:13},
articleno = {4},
numpages = {13},
url = {http://doi.acm.org/10.1145/3237009.3237016},
doi = {10.1145/3237009.3237016},
acmid = {3237016},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Java, graal, heterogeneous hardware, openCL, virtual machine},
}
Selected publications can be found here.
This work is partially funded by Intel corporation. In addition, it has been supported by the following EU & UKRI grants (most recent first):
- EU Horizon Europe & UKRI AERO 101092850.
- EU Horizon Europe & UKRI INCODE 101093069.
- EU Horizon Europe & UKRI ENCRYPT 101070670.
- EU Horizon Europe & UKRI TANGO 101070052.
- EU Horizon 2020 ELEGANT 957286.
- EU Horizon 2020 E2Data 780245.
- EU Horizon 2020 ACTiCLOUD 732366.
Furthermore, TornadoVM has been supported by the following EPSRC grants:
We welcome collaborations! Please see how to contribute to the project in the CONTRIBUTING page.
Additionally, you can open new proposals on the GitHub discussions page.
Alternatively, you can share a Google document with us.
For Academic & Industry collaborations, please contact here.
Visit our website to meet the team.
To use TornadoVM, you can link the TornadoVM API to your application which is under Apache 2.
Each Java TornadoVM module is licensed as follows:
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for TornadoVM
Similar Open Source Tools

TornadoVM
TornadoVM is a plug-in to OpenJDK and GraalVM that allows programmers to automatically run Java programs on heterogeneous hardware. TornadoVM targets OpenCL, PTX and SPIR-V compatible devices which include multi-core CPUs, dedicated GPUs (Intel, NVIDIA, AMD), integrated GPUs (Intel HD Graphics and ARM Mali), and FPGAs (Intel and Xilinx).

sophia
Sophia is an open-source TypeScript platform designed for autonomous AI agents and LLM based workflows. It aims to automate processes, review code, assist with refactorings, and support various integrations. The platform offers features like advanced autonomous agents, reasoning/planning inspired by Google's Self-Discover paper, memory and function call history, adaptive iterative planning, and more. Sophia supports multiple LLMs/services, CLI and web interface, human-in-the-loop interactions, flexible deployment options, observability with OpenTelemetry tracing, and specific agents for code editing, software engineering, and code review. It provides a flexible platform for the TypeScript community to expand and support various use cases and integrations.

typedai
TypedAI is a TypeScript-first AI platform designed for developers to create and run autonomous AI agents, LLM based workflows, and chatbots. It offers advanced autonomous agents, software developer agents, pull request code review agent, AI chat interface, Slack chatbot, and supports various LLM services. The platform features configurable Human-in-the-loop settings, functional callable tools/integrations, CLI and Web UI interface, and can be run locally or deployed on the cloud with multi-user/SSO support. It leverages the Python AI ecosystem through executing Python scripts/packages and provides flexible run/deploy options like single user mode, Firestore & Cloud Run deployment, and multi-user SSO enterprise deployment. TypedAI also includes UI examples, code examples, and automated LLM function schemas for seamless development and execution of AI workflows.

Eco2AI
Eco2AI is a python library for CO2 emission tracking that monitors energy consumption of CPU & GPU devices and estimates equivalent carbon emissions based on regional emission coefficients. Users can easily integrate Eco2AI into their Python scripts by adding a few lines of code. The library records emissions data and device information in a local file, providing detailed session logs with project names, experiment descriptions, start times, durations, power consumption, CO2 emissions, CPU and GPU names, operating systems, and countries.

nous
Nous is an open-source TypeScript platform for autonomous AI agents and LLM based workflows. It aims to automate processes, support requests, review code, assist with refactorings, and more. The platform supports various integrations, multiple LLMs/services, CLI and web interface, human-in-the-loop interactions, flexible deployment options, observability with OpenTelemetry tracing, and specific agents for code editing, software engineering, and code review. It offers advanced features like reasoning/planning, memory and function call history, hierarchical task decomposition, and control-loop function calling options. Nous is designed to be a flexible platform for the TypeScript community to expand and support different use cases and integrations.

scikit-llm
Scikit-LLM is a tool that seamlessly integrates powerful language models like ChatGPT into scikit-learn for enhanced text analysis tasks. It allows users to leverage large language models for various text analysis applications within the familiar scikit-learn framework. The tool simplifies the process of incorporating advanced language processing capabilities into machine learning pipelines, enabling users to benefit from the latest advancements in natural language processing.

swiftide
Swiftide is a fast, streaming indexing and query library tailored for Retrieval Augmented Generation (RAG) in AI applications. It is built in Rust, utilizing parallel, asynchronous streams for blazingly fast performance. With Swiftide, users can easily build AI applications from idea to production in just a few lines of code. The tool addresses frustrations around performance, stability, and ease of use encountered while working with Python-based tooling. It offers features like fast streaming indexing pipeline, experimental query pipeline, integrations with various platforms, loaders, transformers, chunkers, embedders, and more. Swiftide aims to provide a platform for data indexing and querying to advance the development of automated Large Language Model (LLM) applications.

beyondllm
Beyond LLM offers an all-in-one toolkit for experimentation, evaluation, and deployment of Retrieval-Augmented Generation (RAG) systems. It simplifies the process with automated integration, customizable evaluation metrics, and support for various Large Language Models (LLMs) tailored to specific needs. The aim is to reduce LLM hallucination risks and enhance reliability.

docling
Docling is a tool that bundles PDF document conversion to JSON and Markdown in an easy, self-contained package. It can convert any PDF document to JSON or Markdown format, understand detailed page layout, reading order, recover table structures, extract metadata such as title, authors, references, and language, and optionally apply OCR for scanned PDFs. The tool is designed to be stable, lightning fast, and suitable for macOS and Linux environments.

flashinfer
FlashInfer is a library for Language Languages Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, PageAttention and LoRA. FlashInfer focus on LLM serving and inference, and delivers state-the-art performance across diverse scenarios.

AIOS
AIOS, a Large Language Model (LLM) Agent operating system, embeds large language model into Operating Systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI. AIOS is designed to optimize resource allocation, facilitate context switch across agents, enable concurrent execution of agents, provide tool service for agents, maintain access control for agents, and provide a rich set of toolkits for LLM Agent developers.

inferable
Inferable is an open source platform that helps users build reliable LLM-powered agentic automations at scale. It offers a managed agent runtime, durable tool calling, zero network configuration, multiple language support, and is fully open source under the MIT license. Users can define functions, register them with Inferable, and create runs that utilize these functions to automate tasks. The platform supports Node.js/TypeScript, Go, .NET, and React, and provides SDKs, core services, and bootstrap templates for various languages.

HAMi
HAMi is a Heterogeneous AI Computing Virtualization Middleware designed to manage Heterogeneous AI Computing Devices in a Kubernetes cluster. It allows for device sharing, device memory control, device type specification, and device UUID specification. The tool is easy to use and does not require modifying task YAML files. It includes features like hard limits on device memory, partial device allocation, streaming multiprocessor limits, and core usage specification. HAMi consists of components like a mutating webhook, scheduler extender, device plugins, and in-container virtualization techniques. It is suitable for scenarios requiring device sharing, specific device memory allocation, GPU balancing, low utilization optimization, and scenarios needing multiple small GPUs. The tool requires prerequisites like NVIDIA drivers, CUDA version, nvidia-docker, Kubernetes version, glibc version, and helm. Users can install, upgrade, and uninstall HAMi, submit tasks, and monitor cluster information. The tool's roadmap includes supporting additional AI computing devices, video codec processing, and Multi-Instance GPUs (MIG).

MineStudio
MineStudio is a simple and efficient Minecraft development kit for AI research. It contains tools and APIs for developing Minecraft AI agents, including a customizable simulator, trajectory data structure, policy models, offline and online training pipelines, inference framework, and benchmarking automation. The repository is under development and welcomes contributions and suggestions.

FaceAiSharp
FaceAiSharp is a .NET library designed for face-related computer vision tasks. It offers functionalities such as face detection, face recognition, facial landmarks detection, and eye state detection. The library utilizes pretrained ONNX models for accurate and efficient results, enabling users to integrate these capabilities into their .NET applications easily. With a focus on simplicity and performance, FaceAiSharp provides a local processing solution without relying on cloud services, supporting image-based face processing using ImageSharp. It is cross-platform compatible, supporting Windows, Linux, Android, and more.

mcpdotnet
mcpdotnet is a .NET implementation of the Model Context Protocol (MCP), facilitating connections and interactions between .NET applications and MCP clients and servers. It aims to provide a clean, specification-compliant implementation with support for various MCP capabilities and transport types. The library includes features such as async/await pattern, logging support, and compatibility with .NET 8.0 and later. Users can create clients to use tools from configured servers and also create servers to register tools and interact with clients. The project roadmap includes expanding documentation, increasing test coverage, adding samples, performance optimization, SSE server support, and authentication.
For similar tasks

TornadoVM
TornadoVM is a plug-in to OpenJDK and GraalVM that allows programmers to automatically run Java programs on heterogeneous hardware. TornadoVM targets OpenCL, PTX and SPIR-V compatible devices which include multi-core CPUs, dedicated GPUs (Intel, NVIDIA, AMD), integrated GPUs (Intel HD Graphics and ARM Mali), and FPGAs (Intel and Xilinx).
For similar jobs

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

openvino
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. It provides a common API to deliver inference solutions on various platforms, including CPU, GPU, NPU, and heterogeneous devices. OpenVINO™ supports pre-trained models from Open Model Zoo and popular frameworks like TensorFlow, PyTorch, and ONNX. Key components of OpenVINO™ include the OpenVINO™ Runtime, plugins for different hardware devices, frontends for reading models from native framework formats, and the OpenVINO Model Converter (OVC) for adjusting models for optimal execution on target devices.

peft
PEFT (Parameter-Efficient Fine-Tuning) is a collection of state-of-the-art methods that enable efficient adaptation of large pretrained models to various downstream applications. By only fine-tuning a small number of extra model parameters instead of all the model's parameters, PEFT significantly decreases the computational and storage costs while achieving performance comparable to fully fine-tuned models.

jetson-generative-ai-playground
This repo hosts tutorial documentation for running generative AI models on NVIDIA Jetson devices. The documentation is auto-generated and hosted on GitHub Pages using their CI/CD feature to automatically generate/update the HTML documentation site upon new commits.

emgucv
Emgu CV is a cross-platform .Net wrapper for the OpenCV image-processing library. It allows OpenCV functions to be called from .NET compatible languages. The wrapper can be compiled by Visual Studio, Unity, and "dotnet" command, and it can run on Windows, Mac OS, Linux, iOS, and Android.

MMStar
MMStar is an elite vision-indispensable multi-modal benchmark comprising 1,500 challenge samples meticulously selected by humans. It addresses two key issues in current LLM evaluation: the unnecessary use of visual content in many samples and the existence of unintentional data leakage in LLM and LVLM training. MMStar evaluates 6 core capabilities across 18 detailed axes, ensuring a balanced distribution of samples across all dimensions.

VLMEvalKit
VLMEvalKit is an open-source evaluation toolkit of large vision-language models (LVLMs). It enables one-command evaluation of LVLMs on various benchmarks, without the heavy workload of data preparation under multiple repositories. In VLMEvalKit, we adopt generation-based evaluation for all LVLMs, and provide the evaluation results obtained with both exact matching and LLM-based answer extraction.

llava-docker
This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.