LlamaBarn

LlamaBarn

A cosy home for your LLMs.

Stars: 932

Visit
 screenshot

LlamaBarn is a macOS menu bar app designed for running local LLMs. It allows users to install models from a built-in catalog, connect various applications such as chat UIs, editors, CLI tools, and scripts, and manage the loading and unloading of models based on usage. The app ensures all processing is done locally on the user's device, with a small app footprint and zero configuration required. It offers a smart model catalog, self-contained storage for models and configurations, and is built on llama.cpp from the GGML org.

README:

LlamaBarn

LlamaBarn is a macOS menu bar app for running local LLMs.

Watch a 2-minute intro 📽️


LlamaBarn


Install

Install with brew install --cask llamabarn or download from Releases.

How it works

LlamaBarn runs a local server at http://localhost:2276/v1.

  • Install models — from the built-in catalog
  • Connect any app — chat UIs, editors, CLI tools, scripts
  • Models load when requested — and unload when idle

Features

  • 100% local — Models run on your device; no data leaves your Mac
  • Small footprint12 MB native macOS app
  • Zero configuration — models are auto-configured with optimal settings for your Mac
  • Smart model catalog — shows what fits your Mac, with quantized fallbacks for what doesn't
  • Self-contained — all models and config stored in ~/.llamabarn (configurable)
  • Built on llama.cpp — from the GGML org, developed alongside llama.cpp

Works with

LlamaBarn works with any OpenAI-compatible client.

  • Chat UIs — Chatbox, Open WebUI, BoltAI (instructions)
  • Editors — VS Code, Zed, Xcode (instructions)
  • Editor extensions — Cline, Continue
  • CLI tools — OpenCode (instructions), Claude Code (instructions)
  • Custom scripts — curl, AI SDK, etc.

You can also use the built-in WebUI at http://localhost:2276 while LlamaBarn is running.

API examples

# list installed models
curl http://localhost:2276/v1/models
# chat with Gemma 3 4B (assuming it's installed)
curl http://localhost:2276/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemma-3-4b", "messages": [{"role": "user", "content": "Hello"}]}'

Replace gemma-3-4b with any model ID from http://localhost:2276/v1/models.

See complete API reference in llama-server docs.

Experimental settings

Expose to network — By default, the server is only accessible from your Mac (localhost). This option allows connections from other devices on your local network. Only enable this if you understand the security risks.

# bind to all interfaces (0.0.0.0)
defaults write app.llamabarn.LlamaBarn exposeToNetwork -bool YES

# or bind to a specific IP (e.g., for Tailscale)
defaults write app.llamabarn.LlamaBarn exposeToNetwork -string "100.x.x.x"

# disable (default)
defaults delete app.llamabarn.LlamaBarn exposeToNetwork

Roadmap

  • [ ] Support for adding models outside the built-in catalog
  • [ ] Support for loading multiple models at the same time
  • [ ] Support for multiple configurations per model (e.g., multiple context lengths)

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for LlamaBarn

Similar Open Source Tools

For similar tasks

For similar jobs