FastFlowLM

FastFlowLM

Run LLMs on AMD Ryzen™ AI NPUs. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.

Stars: 186

Visit
 screenshot

FastFlowLM is a Python library for efficient and scalable language model inference. It provides a high-performance implementation of language model scoring using n-gram language models. The library is designed to handle large-scale text data and can be easily integrated into natural language processing pipelines for tasks such as text generation, speech recognition, and machine translation. FastFlowLM is optimized for speed and memory efficiency, making it suitable for both research and production environments.

README:

FastFlowLM Logo

Run large language models on AMD Ryzen™ AI NPUs — in minutes.

FastFlowLM (FLM) is a lightweight runtime for deploying LLMs like Gemma3 (Vision supported), Qwen3, DeepSeek-R1, MedGemma and LLaMA3.1/3.2 directly on AMD’s NPU — no GPU needed, faster and over 11x more power efficient than the iGPU or hybrid (iGPU+NPU) solutions.

FLM supports full context lengths — up to 256k tokens with Qwen3-4B-Instruct/Thinking-2507.

Just like Ollama — but purpose-built and deeply optimized for the Ryzen™ AI NPU

FastFlowLM supports all Ryzen™ AI Series chips with XDNA2 NPUs (Strix, Strix Halo, and Kraken).

🔽 Download: flm-setup.exe
📦 Supported Models: docs.fastflowlm.com/models/
📖 Documentation: docs.fastflowlm.com
💬 Discord Server: discord.gg/z24t23HsHF
📺 YouTube Demos: youtube.com/@FastFlowLM-YT/playlists
🧪 Test Drive (Remote Machine): open-webui.testdrive-fastflowlm.com


📺 Demo Videos

From the new Gemma3:4b vision (first NPU-only VLM) model to the think/no_think Qwen3, head-to-head comparisons with Ollama, LM Studio, Lemonade, and more — it’s all up on YouTube!


🧪 Test Drive (Remote Demo)

🚀 Don’t have a Ryzen™ AI PC? Instantly try FastFlowLM on a live AMD Ryzen™ AI 5 340 NPU with 32 GB memory (spec) — no setup needed.

✨ Now with Gemma3:4b (the first NPU-only VLM!) supported here.

🌐 Launch Now: https://open-webui.testdrive-fastflowlm.com/
🔐 Login: [email protected]
🔑 Password: 0000

📺 Watch this short video to see how to try the remote demo in just a few clicks.

Alternatively, sign up with your own credentials instead of using the shared guest account. ⚠️ Some universities or companies may block access to the test drive site. If it doesn't load over Wi-Fi, try switching to a cellular network.
Real-time demo powered by FastFlowLM + Open WebUI — no downloads, no installs.
Try optimized LLM models: gemma3:4b, qwen3:4b, etc. — all accelerated on NPU.

⚠️ Please note:

  • FastFlowLM is designed for single-user local use. This remote demo machine may experience short wait times when multiple users access it concurrently — please be patient.
  • When switching models, it may take longer time to replace the model in memory.
  • Large prompts (30k+ tokens) and VLM (gemma3:4b) may take longer — but it works! 🙂

⚡ Quick Start

A packaged Windows installer is available here: flm-setup.exe. For more details, see the release notes.

📺 Watch the quick start video

⚠️ Ensure NPU driver is 32.0.203.258 or later (check via Task Manager→Performance→NPU or Device Manager) — Driver Download.

After installation, open PowerShell. To run a model in terminal (CLI or Interactive Mode):

flm run llama3.2:1b

Requires internet access to HuggingFace to pull (download) the optimized model kernel. The model will be downloaded to the folder: C:\Users\<USER>\Documents\flm\models\. ⚠️ If HuggingFace is not directly accessible in your region, you can manually download the model and place it in this directory.

To start the local REST API server (Server Mode):

flm serve llama3.2:1b

The model tag (e.g., llama3.2:1b) sets the initial model, which is optional. If another model is requested, FastFlowLM will automatically switch to it. Local server is on port 11434 (default).

By default, FLM runs in performance NPU power mode. You can switch to other NPU power modes (powersaver, balanced, or turbo) using the --pmode flag:

CLI mode:

flm run gemma3:4b --pmode balanced

Server mode:

flm serve gemma3:4b --pmode balanced

⚠️ Note: Using powersaver or balanced will lower NPU clock speeds and cause a significant drop in speed. For more details about NPU power mode, refer to the AMD XRT SMI Documentation.

For detailed instructions, click Documentation.


🖼️ Vision Support for Gemma3:4b (VLM)

FastFlowLM now supports vision-language inference with Gemma3:4b. ⚡ Quick start:

After installation, open PowerShell and run:

flm run gemma3:4b

In CLI, attach an image:

/input "path/to/image.png" What's in this image?

Supports .png and .jpg formats


📚 Supported Models by FastFlowLM (FLM)

Check the full list here: 👉 https://docs.fastflowlm.com/models/


🧠 Local AI on Your NPU

FastFlowLM makes it easy to run cutting-edge LLMs (and now VLMs too ) locally with:

  • ⚡ Fast and low power
  • 🧰 Simple CLI and API
  • 🔐 Fully private and offline

No model rewrites, no tuning — it just works.


✅ Features

  • Runs fully on AMD Ryzen™ AI NPU — no GPU or CPU load
  • Developer-first flow — like Ollama, but optimized for NPU
  • Support for long context windows — up to 128k tokens (e.g., LLaMA 3.1/3.2, Gemma3:4B)
  • No low-level tuning required — You focus on your app, we handle the rest

⚡ Performance

📊 View the detailed results here: [Benchmark results]


🛠️ Instructions

Documentation and examples. Like Ollama, you can:

  • Load and run models locally via CLI (Interactive Mode)
  • Integrate into your app via a simple REST API via a local server (Server Mode)

Compatible with tools like Microsoft AI Toolkit, Open WebUI, and more.


📄 License

  • All orchestration code and CLI tools are open-source under the MIT License.
  • NPU-accelerated kernels are proprietary binaries, free for non-commercial use only — see LICENSE_BINARY.txt and TERMS.md for details.
  • Non-commercial users need to acknowledge FastFlowLM by adding this line to your README or project page:
    Powered by [FastFlowLM](https://github.com/FastFlowLM/FastFlowLM)
    

For commercial use or licensing inquiries, email us: [email protected]


💬 Have feedback/issues or want early access to our new releases?Open an issue or Join our Discord community


🙏 Acknowledgements

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for FastFlowLM

Similar Open Source Tools

For similar tasks

For similar jobs