Awesome-local-LLM

Awesome-local-LLM

A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally

Stars: 259

Visit
 screenshot

Awesome-local-LLM is a curated list of platforms, tools, practices, and resources that help run Large Language Models (LLMs) locally. It includes sections on inference platforms, engines, user interfaces, specific models for general purpose, coding, vision, audio, and miscellaneous tasks. The repository also covers tools for coding agents, agent frameworks, retrieval-augmented generation, computer use, browser automation, memory management, testing, evaluation, research, training, and fine-tuning. Additionally, there are tutorials on models, prompt engineering, context engineering, inference, agents, retrieval-augmented generation, and miscellaneous topics, along with a section on communities for LLM enthusiasts.

README:

Awesome-local-LLM

A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally

Table of Contents

Inference platforms

  • LM Studio - discover, download and run local LLMs
  • jan - an open source alternative to ChatGPT that runs 100% offline on your computer
  • ChatBox - user-friendly desktop client app for AI models/LLMs
  • LocalAI - the free, open-source alternative to OpenAI, Claude and others
  • lemonade - a local LLM server with GPU and NPU Acceleration

Back to Table of Contents

Inference engines

  • ollama - get up and running with LLMs
  • llama.cpp - LLM inference in C/C++
  • ik_llama.cpp - llama.cpp fork with additional SOTA quants and improved performance
  • koboldcpp - run GGUF models easily with a KoboldAI UI
  • vllm - a high-throughput and memory-efficient inference and serving engine for LLMs
  • Nano-vLLM - a lightweight vLLM implementation built from scratch
  • vllm-gfx906 - vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60
  • mlx-lm - generate text and fine-tune large language models on Apple silicon with MLX
  • FastFlowLM - run LLMs on AMD Ryzen™ AI NPUs
  • exo - run your own AI cluster at home with everyday devices
  • gpustack - simple, scalable AI model deployment on GPU clusters
  • sglang - a fast serving framework for large language models and vision language models
  • distributed-llama - connect home devices into a powerful cluster to accelerate LLM inference

Back to Table of Contents

User Interfaces

  • Open WebUI - User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
  • Lobe Chat - an open-source, modern design AI chat framework
  • Text generation web UI - LLM UI with advanced features, easy setup, and multiple backend support
  • SillyTavern - LLM Frontend for Power Users
  • Page Assist - Use your locally running AI models to assist you in your web browsing

Back to Table of Contents

Large Language Models

Explorers, Benchmarks, Leaderboards

Back to Table of Contents

Model providers

  • Qwen - powered by Alibaba Cloud
  • Mistral AI - a pioneering French artificial intelligence startup
  • Tencent - a profile of a Chinese multinational technology conglomerate and holding company
  • Unsloth AI - focusing on making AI more accessible to everyone (GGUFs etc.)
  • bartowski - providing GGUF versions of popular LLMs
  • Beijing Academy of Artificial Intelligence - a private non-profit organization engaged in AI research and development
  • Open Thoughts - a team of researchers and engineers curating the best open reasoning datasets

Back to Table of Contents

Specific models

General purpose

  • Qwen3 - a collection of the latest generation Qwen LLMs
  • Gemma 3 - a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models
  • gpt-oss - a collection of open-weight models from OpenAI, designed for powerful reasoning, agentic tasks, and versatile developer use cases
  • Mistral-Small-3.2-24B-Instruct-2506 - a versatile model designed to handle a wide range of generative AI tasks, including instruction following, conversational assistance, image understanding, and function calling
  • Magistral-Small-2507 - a Mistral Small 3.1 (2503) with added reasoning capabilities
  • GLM-4.5 - a collection of hybrid reasoning models designed for intelligent agents
  • Hunyuan - a collection of Tencent's open-source efficient LLMs designed for versatile deployment across diverse computational environments
  • Phi-4-mini-instruct - a lightweight open model built upon synthetic data and filtered publicly available websites
  • NVIDIA Nemotron - a collection of open, production-ready enterprise models trained from scratch by NVIDIA
  • Llama Nemotron - a collection of open, production-ready enterprise models from NVIDIA
  • OpenReasoning-Nemotron - a collection of models from NVIDIA, trained on 5M reasoning traces for math, code and science
  • Granite 3.3 - a collection of LLMs from IBM, fine-tuned for improved reasoning and instruction-following capabilities
  • EXAONE-4.0 - a collection of LLMs from LG AI Research, integrating non-reasoning and reasoning modes
  • ERNIE 4.5 - a collection of large-scale multimodal models from Baidu
  • Seed-OSS - a collection of LLMs developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features

Back to Table of Contents

Coding

  • Qwen3-Coder - a collection of the Qwen's most agentic code models to date
  • Devstral-Small-2507 - an agentic LLM for software engineering tasks fine-tuned from Mistral-Small-3.1
  • Mellum-4b-base - an LLM from JetBrains, optimized for code-related tasks
  • OlympicCoder-32B - a code model that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench and the 2024 International Olympiad in Informatics
  • NextCoder - a family of code-editing LLMs developed using the Qwen2.5-Coder Instruct variants as base

Back to Table of Contents

Vision

  • Qwen-Image - an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing
  • Qwen-Image-Edit - the image editing version of Qwen-Image extending the base model's unique text rendering capabilities to image editing tasks, enabling precise text editing
  • GLM-4.5V - a VLLM based on ZhipuAI’s next-generation flagship text foundation model GLM-4.5-Air
  • FastVLM - a collection of VLMs with efficient vision encoding from Apple
  • MiniCPM-V-4_5 - a GPT-4o Level MLLM for single image, multi image and high-FPS video understanding on your phone
  • LFM2-VL - a colection of vision-language models, designed for on-device deployment
  • ClipTagger-12b - a vision-language model (VLM) designed for video understanding at massive scale

Back to Table of Contents

Audio

  • Voxtral-Small-24B-2507 - an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance
  • chatterbox - first production-grade open-source TTS model
  • canary-1b-v2 - a multitask speech transcription and translation model from NVIDIA
  • parakeet-tdt-0.6b-v3 - a multilingual speech-to-text model from NVIDIA
  • Kitten TTS - a collection of open-source realistic text-to-speech models designed for lightweight deployment and high-quality voice synthesis

Back to Table of Contents

Miscellaneous

  • Jan-v1-4B - the first release in the Jan Family, designed for agentic reasoning and problem-solving within the Jan App
  • Jan-nano - a compact 4-billion parameter language model specifically designed and trained for deep research tasks
  • Jan-nano-128k - an enhanced version of Jan-nano features a native 128k context window that enables deeper, more comprehensive research capabilities without the performance degradation typically associated with context extension method
  • Arch-Router-1.5B - the fastest LLM router model that aligns to subjective usage preferences
  • HunyuanWorld-1 - an open-source 3D world generation model
  • Hunyuan-GameCraft-1.0 - a novel framework for high-dynamic interactive video generation in game environments

Back to Table of Contents

Tools

Coding Agents

  • zed - a next-generation code editor designed for high-performance collaboration with humans and AI
  • OpenHands - a platform for software development agents powered by AI
  • cline - autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way
  • aider - AI pair programming in your terminal
  • tabby - an open-source GitHub Copilot alternative, set up your own LLM-powered code completion server
  • continue - create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks
  • void - an open-source Cursor alternative, use AI agents on your codebase, checkpoint and visualize changes, and bring any model or host locally
  • Roo-Code - a whole dev team of AI agents in your code editor
  • goose - an open-source, extensible AI agent that goes beyond code suggestions
  • opencode - a AI coding agent built for the terminal
  • crush - the glamourous AI coding agent for your favourite terminal
  • kilocode - open source AI coding assistant for planning, building, and fixing code
  • ProxyAI - the leading open-source AI copilot for JetBrains

Back to Table of Contents

Agent Frameworks

  • AutoGPT - a powerful platform that allows you to create, deploy, and manage continuous AI agents that automate complex workflows
  • langchain - build context-aware reasoning applications
  • langflow - a powerful tool for building and deploying AI-powered agents and workflows
  • autogen - a programming framework for agentic AI
  • anything-llm - the all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more
  • llama_index - the leading framework for building LLM-powered agents over your data
  • Flowise - build AI agents, visually
  • crewAI - a framework for orchestrating role-playing, autonomous AI agents
  • agno - a full-stack framework for building Multi-Agent Systems with memory, knowledge and reasoning
  • SuperAGI - an open-source framework to build, manage and run useful Autonomous AI Agents
  • camel - the first and the best multi-agent framework
  • openai-agents-python - a lightweight, powerful framework for multi-agent workflows
  • txtai - all-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows
  • archgw - a high-performance proxy server that handles the low-level work in building agents: like applying guardrails, routing prompts to the right agent, and unifying access to LLMs, etc.
  • ClaraVerse - privacy-first, fully local AI workspace with Ollama LLM chat, tool calling, agent builder, Stable Diffusion, and embedded n8n-style automation
  • ragbits - building blocks for rapid development of GenAI applications

Back to Table of Contents

Retrieval-Augmented Generation

  • graphrag - a modular graph-based RAG system
  • haystack - AI orchestration framework to build customizable, production-ready LLM applications, best suited for building RAG, question answering, semantic search or conversational agent chatbots
  • LightRAG - simple and fast RAG
  • graphiti - build real-time knowledge graphs for AI Agents
  • vanna - an open-source Python RAG framework for SQL generation and related functionality

Back to Table of Contents

Computer Use

  • open-interpreter - a natural language interface for computers
  • OmniParser - a simple screen parsing tool towards pure vision based GUI agent
  • self-operating-computer - a framework to enable multimodal models to operate a computer
  • cua - the Docker Container for Computer-Use AI Agents
  • Agent-S - an open agentic framework that uses computers like a human

Back to Table of Contents

Browser Automation

  • puppeteer - a JavaScript API for Chrome and Firefox
  • playwright - a framework for Web Testing and Automation
  • Playwright MCP server - an MCP server that provides browser automation capabilities using Playwright
  • browser-use - make websites accessible for AI agents
  • firecrawl - turn entire websites into LLM-ready markdown or structured data
  • stagehand - the AI Browser Automation Framework

Back to Table of Contents

Memory Management

  • mem0 - universal memory layer for AI Agents
  • letta - the stateful agents framework with memory, reasoning, and context management
  • cognee - memory for AI Agents in 5 lines of code
  • LMCache - supercharge your LLM with the fastest KV Cache Layer

Back to Table of Contents

Testing, Evaluation, and Observability

  • langfuse - an open-source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more
  • opik - debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards
  • openllmetry - an open-source observability for your LLM application, based on OpenTelemetry
  • giskard - an open-source evaluation & testing for AI & LLM systems
  • agenta - an open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place

Back to Table of Contents

Research

  • Perplexica - an open-source alternative to Perplexity AI, the AI-powered search engine
  • gpt-researcher - an LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations
  • local-deep-researcher - fully local web research and report writing assistant
  • SurfSense - an open-source alternative to NotebookLM / Perplexity / Glean
  • local-deep-research - an AI-powered research assistant for deep, iterative research
  • maestro - an AI-powered research application designed to streamline complex research tasks
  • open-notebook - an open-source implementation of Notebook LM with more flexibility and features

Back to Table of Contents

Training and Fine-tuning

  • OpenRLHF - an easy-to-use, high-performance open-source RLHF framework built on Ray, vLLM, ZeRO-3 and HuggingFace Transformers, designed to make RLHF training simple and accessible
  • Kiln - the easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets
  • augmentoolkit - train an open-source LLM on new facts

Back to Table of Contents

Miscellaneous

  • context7 - up-to-date code documentation for LLMs and AI code editors
  • cai - Cybersecurity AI (CAI), the framework for AI Security
  • speakr - a personal, self-hosted web application designed for transcribing audio recordings
  • presenton - an open-source AI presentation generator and API
  • OmniGen2 - exploration to advanced multimodal generation
  • 4o-ghibli-at-home - a powerful, self-hosted AI photo stylizer built for performance and privacy
  • Observer - local open-source micro-agents that observe, log and react, all while keeping your data private and secure
  • mobile-use - a powerful, open-source AI agent that controls your Android or IOS device using natural language
  • gabber - build AI applications that can see, hear, and speak using your screens, microphones, and cameras as inputs
  • promptcat - a zero-dependency prompt manager/catalog/library in a single HTML file

Back to Table of Contents

Hardware

Back to Table of Contents

Tutorials

Models

Back to Table of Contents

Prompt Engineering

Back to Table of Contents

Context Engineering

  • Context-Engineering - a frontier, first-principles handbook inspired by Karpathy and 3Blue1Brown for moving beyond prompt engineering to the wider discipline of context design, orchestration, and optimization
  • Awesome-Context-Engineering - a comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems

Back to Table of Contents

Inference

  • vLLM Production Stack - vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Back to Table of Contents

Agents

Back to Table of Contents

Retrieval-Augmented Generation

  • RAG Techniques - various advanced techniques for Retrieval-Augmented Generation (RAG) systems
  • Controllable RAG Agent - an advanced Retrieval-Augmented Generation (RAG) solution for complex question answering that uses sophisticated graph based algorithm to handle the tasks
  • LangChain RAG Cookbook - a collection of modular RAG techniques, implemented in LangChain + Python

Back to Table of Contents

Miscellaneous

Back to Table of Contents

Communities

Back to Table of Contents

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to get started.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Awesome-local-LLM

Similar Open Source Tools

For similar tasks

For similar jobs