
mlir-aie
An MLIR-based toolchain for AMD AI Engine-enabled devices.
Stars: 483

This repository contains an MLIR-based toolchain for AI Engine-enabled devices, such as AMD Ryzen™ AI and Versal™. This repository can be used to generate low-level configurations for the AI Engine portion of these devices. AI Engines are organized as a spatial array of tiles, where each tile contains AI Engine cores and/or memories. The spatial array is connected by stream switches that can be configured to route data between AI Engine tiles scheduled by their programmable Data Movement Accelerators (DMAs). This repository contains MLIR representations, with multiple levels of abstraction, to target AI Engine devices. This enables compilers and developers to program AI Engine cores, as well as describe data movements and array connectivity. A Python API is made available as a convenient interface for generating MLIR design descriptions. Backend code generation is also included, targeting the aie-rt library. This toolchain uses the AI Engine compiler tool which is part of the AMD Vitis™ software installation: these tools require a free license for use from the Product Licensing Site.
README:
This project emphasizes fast, open-source toolchains for NPU devices including LLVM-based code generation. IRON contains a close-to-metal toolkit that empowers performance engineers to create fast and efficient designs for Ryzen™ AI NPUs powered by AI Engines. It provides Python APIs that enable developers to harness the unique architectural capabilities of AMD’s NPUs. However, this project is not intended to represent an end-to-end compilation flow for all application designs---it is designed to complement, not replace, mainstream NPU tooling for inference like the AMD Ryzen™ AI Software Platform. Targeting researchers and enthusiasts, IRON is designed to unlock the full potential of NPUs for a wide range of workloads, from machine learning to digital signal processing and beyond. This repository includes programming guides and examples demonstrating the APIs. Additionally, the Peano component extends the LLVM framework by adding support for the AI Engine processor as a target architecture, enabling integration with popular compiler frontends such as clang
. Developers can leverage the AIE API header library to implement efficient vectorized AIE core code in C++ that can be compiled by Peano.
This repository contains an MLIR-based toolchain for AI Engine-enabled devices, such as AMD Ryzen™ AI and Versal™. This repository can be used to generate low-level configurations for the AI Engine portion of these devices. AI Engines are organized as a spatial array of tiles, where each tile contains AI Engine cores and/or memories. The spatial array is connected by stream switches that can be configured to route data between AI Engine tiles scheduled by their programmable Data Movement Accelerators (DMAs). This repository contains MLIR representations, with multiple levels of abstraction, to target AI Engine devices. This enables compilers and developers to program AI Engine cores, as well as describe data movements and array connectivity.
The IRON Python API for Ryzen™ AI NPUs is described in the following paper:
E. Hunhoff, J. Melber, K. Denolf, A. Bisca, S. Bayliss, S. Neuendorffer, J. Fifield, J. Lo, P. Vasireddy, P. James-Roxby, E. Keller. "Efficiency, Expressivity, and Extensibility in a Close-to-Metal NPU Programming Interface". In 33rd IEEE International Symposium On Field-Programmable Custom Computing Machines, May 2025.
These instructions will guide you through everything required for building and executing a program on the Ryzen™ AI NPU, starting from a fresh bare-bones Ubuntu 24.04 or Ubuntu 24.10 install.
Be sure you have the latest BIOS on your laptop or mini-PC that enables the NPU. See here.
If starting from Ubuntu 24.04
you may need to update the Linux kernel to 6.11+ by installing the Hardware Enablement (HWE) stack:
sudo apt update
sudo apt install --install-recommends linux-generic-hwe-24.04
sudo reboot
Turn off SecureBoot (Allows for unsigned drivers to be installed):
BIOS → Security → Secure boot → Disable
-
Execute the scripted build process:
This script will install package dependencies, build the xdna-driver and xrt packages, and install them. These steps require
sudo
access.bash ./utils/build_drivers.sh
-
Reboot as directed after the script exits.
sudo reboot
-
Check that the NPU is working if the device appears with xrt-smi:
source /opt/xilinx/xrt/setup.sh xrt-smi examine
At the bottom of the output you should see:
Devices present BDF : Name ------------------------------------ [0000:66:00.1] : NPU Strix
-
Install the following packages needed for MLIR-AIE:
# Python versions 3.10, 3.12 and 3.13 are currently supported by our wheels sudo apt install \ build-essential clang clang-14 lld lld-14 cmake ninja-build python3-venv python3-pip
-
(Optional) Install opencv which is needed for vision programming examples:
sudo apt install libopencv-dev python3-opencv
-
Clone the mlir-aie repository:
git clone https://github.com/Xilinx/mlir-aie.git cd mlir-aie
-
Setup a virtual environment:
python3 -m venv ironenv source ironenv/bin/activate python3 -m pip install --upgrade pip
-
Install IRON library, mlir-aie and llvm-aie compilers from wheels and dependencies:
For release v1.0:
# Install IRON library and mlir-aie from a wheel python3 -m pip install mlir_aie -f https://github.com/Xilinx/mlir-aie/releases/expanded_assets/v1.0 # Install Peano from a llvm-aie wheel python3 -m pip install https://github.com/Xilinx/llvm-aie/releases/download/nightly/llvm_aie-19.0.0.2025041501+b2a279c1-py3-none-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl # Install basic Python requirements (still needed for release v1.0, but is no longer needed for latest wheels) python3 -m pip install -r python/requirements.txt # Install MLIR Python Extras HOST_MLIR_PYTHON_PACKAGE_PREFIX=aie python3 -m pip install -r python/requirements_extras.txt
For daily latest:
# Install IRON library and mlir-aie from a wheel python3 -m pip install mlir_aie -f https://github.com/Xilinx/mlir-aie/releases/expanded_assets/latest-wheels-2 # Install Peano from llvm-aie wheel python3 -m pip install llvm-aie -f https://github.com/Xilinx/llvm-aie/releases/expanded_assets/nightly # Install MLIR Python Extras HOST_MLIR_PYTHON_PACKAGE_PREFIX=aie python3 -m pip install -r python/requirements_extras.txt
-
(Optional) Install Python packages required for development and testing:
# Install Python requirements for development and testing python3 -m pip install -r python/requirements_dev.txt # This installs the pre-commit hooks defined in .pre-commit-config.yaml pre-commit install
-
Setup environment
source utils/env_setup.sh
-
(Optional) Install ML Python packages for ml programming examples:
# Install Torch for ML examples python3 -m pip install -r python/requirements_ml.txt
-
(Optional) Install Jupyter Notebook Python packages:
# Install Jupyter Notebook python3 -m pip install -r python/requirements_notebook.txt # This creates an ipykernel (for use in notebooks) using the ironenv venv python3 -m ipykernel install --user --name ironenv # Only for Release v1.0 and non wheel-based installs: # The install generally captures in the $PYTHONPATH by the `env_setup.sh` script. # However, jupyter notebooks don't always get access to the PYTHONPATH (e.g., if they are run with # vscode) so we save the ${MLIR_AIE_INSTALL_DIR}/python in a .pth file in the site packages dir of the # ironenv venv; this allows the iron ipykernel to find the install dir regardless of if PYTHONPATH is # available or not. MLIR_AIE_INSTALL=`$(pip show mlir_aie | grep ^Location: | awk '{print $2}')/mlir_aie` \ venv_site_packages=`python3 -c 'import sysconfig; print(sysconfig.get_paths()["purelib"])'` \ echo ${MLIR_AIE_INSTALL}/python > $venv_site_packages/mlir-aie.pth
For your design of interest, for instance from programming_examples, 2 steps are needed: (i) build the AIE design and then (ii) build the host code.
-
Goto the design of interest and run:
make
-
Build host code and execute the design:
make run
-
Continue to the IRON AIE Application Programming Guide
-
Additional MLIR-AIE documentation is available on the website
-
AIE API header library documentation for single-core AIE programming in C++ is avaiable here
-
If you are a university researcher or student and interested in trying these tools on our Ryzen™ AI AUP Cloud systems, please contact the AMD University Program
You may skip the Vitis™ installation step if you intend to only target AMD XDNA™/AIE-ML (AIE2) and AMD XDNA™ 2 (AIE2P) using our open-source single-core compiler Peano. Compiling with
xchesscc
is not supported without installing AMD Vitis™ AIE Essentials.
-
Install Vitis™ AIE Essentials from Ryzen AI Software 1.3 Early Access. We will assume you use the installation directory,
/tools/ryzen_ai-1.3.0/vitis_aie_essentials
.This is an early access lounge, you must register and be granted access at this time.
-
Download VAIML Installer for Linux based compilation:
ryzen_ai-1.3.0ea1.tgz
-
Extract the required tools:
tar -xzvf ryzen_ai-1.3.0ea1.tgz cd ryzen_ai-1.3.0 mkdir vitis_aie_essentials mv vitis_aie_essentials*.whl vitis_aie_essentials cd vitis_aie_essentials unzip vitis_aie_essentials*.whl
-
-
Set up an AI Engine license.
-
Get a local license for AI Engine tools from https://www.xilinx.com/getlicense.
-
Copy your license file (Xilinx.lic) to your preferred location, e.g.
/opt/Xilinx.lic
:
-
-
Setup your environment using the following script for Vitis™ for AIETools:
#!/bin/bash ################################################################################# # Setup Vitis AIE Essentials ################################################################################# export AIETOOLS_ROOT=/tools/ryzen_ai-1.3.0/vitis_aie_essentials export PATH=$PATH:${AIETOOLS_ROOT}/bin export LM_LICENSE_FILE=/opt/Xilinx.lic
Be sure you have the latest BIOS for your laptop or mini PC, this will ensure the NPU (sometimes referred to as IPU) is enabled in the system. You may need to manually enable the NPU:
Advanced → CPU Configuration → IPU
NOTE: Some manufacturers only provide Windows executables to update the BIOS, please do this before installing Ubuntu.
IRON AIE Application Programming Guide
Building mlir-aie tools from source
MLIR Dialect and Compiler Documentation
Interested in contributing MLIR-AIE? Information for developers
Copyright© 2019-2024 Advanced Micro Devices, Inc
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for mlir-aie
Similar Open Source Tools

mlir-aie
This repository contains an MLIR-based toolchain for AI Engine-enabled devices, such as AMD Ryzen™ AI and Versal™. This repository can be used to generate low-level configurations for the AI Engine portion of these devices. AI Engines are organized as a spatial array of tiles, where each tile contains AI Engine cores and/or memories. The spatial array is connected by stream switches that can be configured to route data between AI Engine tiles scheduled by their programmable Data Movement Accelerators (DMAs). This repository contains MLIR representations, with multiple levels of abstraction, to target AI Engine devices. This enables compilers and developers to program AI Engine cores, as well as describe data movements and array connectivity. A Python API is made available as a convenient interface for generating MLIR design descriptions. Backend code generation is also included, targeting the aie-rt library. This toolchain uses the AI Engine compiler tool which is part of the AMD Vitis™ software installation: these tools require a free license for use from the Product Licensing Site.

kvcached
kvcached is a new KV cache management system that supports on-demand KV cache allocation. It implements the concept of GPU virtual memory, allowing applications to reserve virtual address space without immediately committing physical memory. Physical memory is then automatically allocated and mapped as needed at runtime. This capability allows multiple LLMs to run concurrently on a single GPU or a group of GPUs (TP) and flexibly share the GPU memory, significantly improving GPU utilization and reducing memory fragmentation. kvcached is compatible with popular LLM serving engines, including SGLang and vLLM.

JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome). It is designed to provide high performance and scalability for large language models, enabling efficient inference on cloud-based TPUs. JetStream leverages XLA to optimize the execution of LLM models, resulting in faster and more efficient inference. Additionally, JetStream supports quantization techniques to further enhance performance and reduce memory consumption. By utilizing JetStream, developers can deploy and run LLM models on TPUs with ease, achieving optimal performance and cost-effectiveness.

JetStream
JetStream is a throughput and memory optimized engine for Large Language Model (LLM) inference on XLA devices, specifically TPUs. It provides reference engine implementations for Jax and Pytorch models, along with documentation for online inference, serving Gemma using TPUs on GKE, benchmarking, observability, profiling, and standalone local setup. Users can easily set up a local server, run tests, and test core modules. JetStream aims to enhance the performance of LLM inference on XLA devices.

mcpm.sh
MCPM is an open source CLI tool for managing MCP servers, providing a simplified global configuration approach to install servers once, organize them with profiles, and integrate them into any MCP client. Features include server discovery, direct execution, sharing capabilities, and client integration tools. It eliminates the complexity of v1's target-based system in favor of a clean global workspace model. The tool is designed to be AI agent friendly with comprehensive automation support and a rich CLI interface.

aiaio
aiaio (AI-AI-O) is a lightweight, privacy-focused web UI for interacting with AI models. It supports both local and remote LLM deployments through OpenAI-compatible APIs. The tool provides features such as dark/light mode support, local SQLite database for conversation storage, file upload and processing, configurable model parameters through UI, privacy-focused design, responsive design for mobile/desktop, syntax highlighting for code blocks, real-time conversation updates, automatic conversation summarization, customizable system prompts, WebSocket support for real-time updates, Docker support for deployment, multiple API endpoint support, and multiple system prompt support. Users can configure model parameters and API settings through the UI, handle file uploads, manage conversations, and use keyboard shortcuts for efficient interaction. The tool uses SQLite for storage with tables for conversations, messages, attachments, and settings. Contributions to the project are welcome under the Apache License 2.0.

flock
Flock is a workflow-based low-code platform that enables rapid development of chatbots, RAG applications, and coordination of multi-agent teams. It offers a flexible, low-code solution for orchestrating collaborative agents, supporting various node types for specific tasks, such as input processing, text generation, knowledge retrieval, tool execution, intent recognition, answer generation, and more. Flock integrates LangChain and LangGraph to provide offline operation capabilities and supports future nodes like Conditional Branch, File Upload, and Parameter Extraction for creating complex workflows. Inspired by StreetLamb, Lobe-chat, Dify, and fastgpt projects, Flock introduces new features and directions while leveraging open-source models and multi-tenancy support.

chunkr
Chunkr is an open-source document intelligence API that provides a production-ready service for document layout analysis, OCR, and semantic chunking. It allows users to convert PDFs, PPTs, Word docs, and images into RAG/LLM-ready chunks. The API offers features such as layout analysis, OCR with bounding boxes, structured HTML and markdown output, and VLM processing controls. Users can interact with Chunkr through a Python SDK, enabling them to upload documents, process them, and export results in various formats. The tool also supports self-hosted deployment options using Docker Compose or Kubernetes, with configurations for different AI models like OpenAI, Google AI Studio, and OpenRouter. Chunkr is dual-licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) and a commercial license, providing flexibility for different usage scenarios.

NextChat
NextChat is a well-designed cross-platform ChatGPT web UI tool that supports Claude, GPT4, and Gemini Pro. It offers a compact client for Linux, Windows, and MacOS, with features like self-deployed LLMs compatibility, privacy-first data storage, markdown support, responsive design, and fast loading speed. Users can create, share, and debug chat tools with prompt templates, access various prompts, compress chat history, and use multiple languages. The tool also supports enterprise-level privatization and customization deployment, with features like brand customization, resource integration, permission control, knowledge integration, security auditing, private deployment, and continuous updates.

ai-terminal
AI Terminal is a Tauri + Angular terminal application with integrated AI capabilities, offering natural language command interpretation, an integrated AI assistant, command history and auto-completion, and cross-platform support (macOS, Windows, Linux). The modern UI is built with Tauri and Angular. The tool requires Node.js 18+, Rust and Cargo, and Ollama for AI features. Users can build a universal binary for macOS, install the tool using Homebrew, and use Ollama to download specific models. Contributions are welcome under the MIT License.

pianotrans
ByteDance's Piano Transcription is a PyTorch implementation for transcribing piano recordings into MIDI files with pedals. This repository provides a simple GUI and packaging for Windows and Nix on Linux/macOS. It supports using GPU for inference and includes CLI usage. Users can upgrade the tool and report issues to the upstream project. The tool focuses on providing MIDI files, and any other improvements to transcription results should be directed to the original project.

KsanaLLM
KsanaLLM is a high-performance engine for LLM inference and serving. It utilizes optimized CUDA kernels for high performance, efficient memory management, and detailed optimization for dynamic batching. The tool offers flexibility with seamless integration with popular Hugging Face models, support for multiple weight formats, and high-throughput serving with various decoding algorithms. It enables multi-GPU tensor parallelism, streaming outputs, and an OpenAI-compatible API server. KsanaLLM supports NVIDIA GPUs and Huawei Ascend NPU, and seamlessly integrates with verified Hugging Face models like LLaMA, Baichuan, and Qwen. Users can create a docker container, clone the source code, compile for Nvidia or Huawei Ascend NPU, run the tool, and distribute it as a wheel package. Optional features include a model weight map JSON file for models with different weight names.

Shellsage
Shell Sage is an intelligent terminal companion and AI-powered terminal assistant that enhances the terminal experience with features like local and cloud AI support, context-aware error diagnosis, natural language to command translation, and safe command execution workflows. It offers interactive workflows, supports various API providers, and allows for custom model selection. Users can configure the tool for local or API mode, select specific models, and switch between modes easily. Currently in alpha development, Shell Sage has known limitations like limited Windows support and occasional false positives in error detection. The roadmap includes improvements like better context awareness, Windows PowerShell integration, Tmux integration, and CI/CD error pattern database.

fastapi-langgraph-agent-production-ready-template
A production-ready FastAPI template for building AI agent applications with LangGraph integration. This template provides a robust foundation for building scalable, secure, and maintainable AI agent services. It includes features like FastAPI for high-performance async API endpoints, LangGraph integration, structured logging, rate limiting, PostgreSQL for data persistence, Docker support, security measures like JWT-based authentication and input sanitization, developer-friendly features like environment-specific configuration and type hints, a model evaluation framework with automated metric-based evaluation and detailed JSON reports, and a configuration system with environment-specific settings.

OpenSpec
OpenSpec is a tool for spec-driven development, aligning humans and AI coding assistants to agree on what to build before any code is written. It adds a lightweight specification workflow that ensures deterministic, reviewable outputs without the need for API keys. With OpenSpec, stakeholders can draft change proposals, review and align with AI assistants, implement tasks based on agreed specs, and archive completed changes for merging back into the source-of-truth specs. It works seamlessly with existing AI tools, offering shared visibility into proposed, active, or archived work.

aituber-server
AITuberKit server-side is a tool that allows users to receive messages via WebSocket and obtain responses from Open Interpreter. Users can also send files to the server for storage and issue commands to Open Interpreter. The tool is designed for WebSocket operation and provides a default connection URL of `ws://127.0.0.1:8000/ws`. It supports debugging in VSCode with DEBUG_MODE=1. The tool is licensed under KillianLucas/open-interpreter and includes a guide on how to use Open Interpreter.
For similar tasks

mlir-aie
This repository contains an MLIR-based toolchain for AI Engine-enabled devices, such as AMD Ryzen™ AI and Versal™. This repository can be used to generate low-level configurations for the AI Engine portion of these devices. AI Engines are organized as a spatial array of tiles, where each tile contains AI Engine cores and/or memories. The spatial array is connected by stream switches that can be configured to route data between AI Engine tiles scheduled by their programmable Data Movement Accelerators (DMAs). This repository contains MLIR representations, with multiple levels of abstraction, to target AI Engine devices. This enables compilers and developers to program AI Engine cores, as well as describe data movements and array connectivity. A Python API is made available as a convenient interface for generating MLIR design descriptions. Backend code generation is also included, targeting the aie-rt library. This toolchain uses the AI Engine compiler tool which is part of the AMD Vitis™ software installation: these tools require a free license for use from the Product Licensing Site.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.