FastFlowLM

FastFlowLM

Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.

Stars: 209

Visit
 screenshot

FastFlowLM is a Python library for efficient and scalable language model inference. It provides a high-performance implementation of language model scoring using n-gram language models. The library is designed to handle large-scale text data and can be easily integrated into natural language processing pipelines for tasks such as text generation, speech recognition, and machine translation. FastFlowLM is optimized for speed and memory efficiency, making it suitable for both research and production environments.

README:

FastFlowLM Logo

⚡ FastFlowLM (FLM) — Unlock Ryzen™ AI NPUs

Run large language models — now with Vision support — on AMD Ryzen™ AI NPUs in minutes.
No GPU required. Faster and over 10× more power-efficient. Context lengths up to 256k tokens.

📦 The only out-of-box, NPU-first runtime built exclusively for Ryzen™ AI.
🤝 Think Ollama — but deeply optimized for NPUs.
From Idle Silicon to Instant Power — FastFlowLM Makes Ryzen™ AI Shine.

FastFlowLM (FLM) supports all Ryzen™ AI Series chips with XDNA2 NPUs (Strix, Strix Halo, and Kraken).


🔗 Quick Links

🔽 Download | 📊 Benchmarks | 📦 Model List

📖 Docs | 📺 Demos | 🧪 Test Drive | 💬 Discord


🚀 Quick Start

A packaged FLM Windows installer is available here: flm-setup.exe. For more details, see the release notes.

📺 Watch the quick start video

⚠️ Ensure NPU driver is 32.0.203.258 or later (check via Task Manager→Performance→NPU or Device Manager) — Driver Download.

After installation, open PowerShell (Win + X → I). To run a model in terminal (CLI Mode):

flm run llama3.2:1b

Notes:

  • Internet access to HuggingFace is required to download the optimized model kernels.
  • By default, models are stored in: C:\Users\<USER>\Documents\flm\models\
  • During installation, you can select a different base folder (e.g., if you choose C:\Users\<USER>\flm, models will be saved under C:\Users\<USER>\flm\models\).
  • ⚠️ If HuggingFace is not accessible in your region, manually download the model (check this issue) and place it in the chosen directory.

🎉🚀 FastFlowLM (FLM) is ready — your NPU is unlocked and you can start chatting with models right away!

Open Task Manager (Ctrl + Shift + Esc). Go to the Performance tab → click NPU to monitor usage.

⚡ Quick Tips:

  • Use /verbose during a session to turn on performance reporting (toggle off with /verbose again).
  • Type /bye to exit a conversation.
  • Run flm list in PowerShell to show all available models.

To start the local server (Server Mode):

flm serve llama3.2:1b

The model tag (e.g., llama3.2:1b) sets the initial model, which is optional. If another model is requested, FastFlowLM will automatically switch to it. Local server is on port 11434 (default).

FastFlowLM Docs


🧠 Local AI on NPU

FLM makes it easy to run cutting-edge LLMs (and now VLMs) locally with:

  • ⚡ Fast and low power
  • 🧰 Simple CLI and API (REST and OpenAI API)
  • 🔐 Fully private and offline

No model rewrites, no tuning — it just works.


✅ Highlights

  • Runs fully on AMD Ryzen™ AI NPU — no GPU or CPU load
  • Developer-first flow — like Ollama, but optimized for NPU
  • Support for long context windows — up to 256k tokens (e.g., Qwen3-4B-Thinking-2507)
  • No low-level tuning required — You focus on your app, we handle the rest

📄 License

  • All orchestration code and CLI tools are open-source under the MIT License.
  • NPU-accelerated kernels are proprietary binaries, free for non-commercial use only — see LICENSE_BINARY.txt and TERMS.md for details.
  • Non-commercial users: Please acknowledge FastFlowLM in your README/project page:
    Powered by [FastFlowLM](https://github.com/FastFlowLM/FastFlowLM)
    

For commercial use or licensing inquiries, email us: [email protected]


💬 Have feedback/issues or want early access to our new releases? Open an issue or Join our Discord community


🙏 Acknowledgements

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for FastFlowLM

Similar Open Source Tools

For similar tasks

For similar jobs