llamafarm

llamafarm

Deploy any AI model, agents, database, RAG, and pipeline locally in minutes

Stars: 110

Visit
 screenshot

LlamaFarm is a comprehensive AI framework that empowers users to build powerful AI applications locally, with full control over costs and deployment options. It provides modular components for RAG systems, vector databases, model management, prompt engineering, and fine-tuning. Users can create differentiated AI products without needing extensive ML expertise, using simple CLI commands and YAML configs. The framework supports local-first development, production-ready components, strategy-based configuration, and deployment anywhere from laptops to the cloud.

README:

🦙 LlamaFarm - Build Powerful AI Locally, Deploy Anywhere

Llama Building a Rocket

The Complete AI Development Framework - From Local Prototypes to Production Systems

License: MIT Python 3.8+ Go 1.19+ PRs Welcome Discord

🚀 Quick Start📚 Documentation🏗️ Architecture🤝 Contributing


🚀 What is LlamaFarm?

🚧 Building in the Open: We're actively developing LlamaFarm and not everything is working yet. Join us as we build the future of local-first AI development! Check our roadmap to see what's coming and how you can contribute.

Why LlamaFarm?

The AI revolution should be accessible to everyone, not just ML experts and big tech companies. We believe you shouldn't need a PhD to build powerful AI applications - just a CLI, your config files, and your data. Too many teams are stuck between expensive cloud APIs that lock you in, or complex open-source tools that require months of ML expertise to productionize. LlamaFarm changes this: full control and production-ready AI with simple commands and YAML configs. No machine learning degree required - if you can write config files and run CLI commands, you can build sophisticated AI systems. Build locally with your data, maintain complete control over costs, and deploy anywhere from your laptop to the cloud - all with the same straightforward interface.

LlamaFarm is a comprehensive, modular framework for building AI Projects that run locally, collaborate, and deploy anywhere. We provide battle-tested components for RAG systems, vector databases, model management, prompt engineering, and soon fine-tuning - all designed to work seamlessly together or independently.

We're not local-only zealots - use cloud APIs where they make sense for your needs - llamafarm helps with that! But we believe the real value in the AI economy comes from building something uniquely yours, not just wrapping another UI around GPT-5. True innovation happens when you can train on your proprietary data, fine-tune for your specific use cases, and maintain full control over your AI stack. LlamaFarm gives you the tools to create differentiated AI products that your competitors can't simply copy by calling the same API.

LlamaFarm is a comprehensive, modular AI framework that gives you complete control over your AI stack. Unlike cloud-only solutions, we provide:

  • 🏠 Local-First Development - Build and test entirely on your machine
  • 🔧 Production-Ready Components - Battle-tested modules that scale from laptop to cluster
  • 🎯 Strategy/config-Based Configuration - Smart defaults with infinite customization
  • 🚀 Deploy Anywhere - Same code runs locally, on-premise, or in any cloud

🎭 Perfect For

  • Developers who want to build AI applications without vendor lock-in
  • Teams needing cost control and data privacy
  • Enterprises requiring scalable, secure AI infrastructure
  • Researchers experimenting with cutting-edge techniques

🏗️ Core Components

LlamaFarm is built as a modular system where each component can be used independently or orchestrated together for powerful AI applications.

⚙️ System Components

🚀 Runtime

The execution environment that orchestrates all components and manages the application lifecycle.

  • Process Management: Handles component initialization and shutdown
  • API/Access Layer: Send queries to /chat, data to /data, and get full results with ease.
  • Resource Allocation: Manages memory, CPU, and GPU resources efficiently
  • Service Discovery: Automatically finds and connects components
  • Health Monitoring: Tracks component status and performance metrics
  • Error Recovery: Automatic restart and fallback mechanisms

📦 Deployer

Zero-configuration deployment system that works from local development to production clusters.

  • Environment Detection: Automatically adapts to local, Docker, or cloud environments
  • Configuration Management: Handles environment variables and secrets securely
  • Scaling: Horizontal and vertical scaling based on load
  • Load Balancing: Distributes requests across multiple instances
  • Rolling Updates: Zero-downtime deployments with automatic rollback

🧠 AI Components

🔍 Data Pipeline (RAG)

Complete document processing and retrieval system for building knowledge-augmented applications.

  • Document Ingestion: Parse 15+ formats (PDF, Word, Excel, HTML, Markdown, etc.)
  • Smart Extraction: Extract entities, keywords, statistics without LLMs
  • Vector Storage: Integration with 8+ vector databases (Chroma, Pinecone, FAISS, etc.)
  • Hybrid Search: Combine semantic, keyword, and metadata-based retrieval
  • Chunking Strategies: Adaptive chunking based on document type and use case
  • Incremental Updates: Efficiently update knowledge base without full reprocessing

🤖 Models

Unified interface for all LLM operations with enterprise-grade features.

  • Multi-Provider Support: 25+ providers (OpenAI, Anthropic, Google, Ollama, etc.)
  • Automatic Failover: Seamless fallback between providers when errors occur
  • Fine-Tuning Pipeline: Train custom models on your data (Coming Q2 2025)
  • Cost Optimization: Route queries to cheapest capable model
  • Load Balancing: Distribute across multiple API keys and endpoints
  • Response Caching: Intelligent caching to reduce API costs
  • Model Configuration: Per-model temperature, token limits, and parameters

📝 Prompts

Enterprise prompt management system with version control and A/B testing.

  • Template Library: 20+ pre-built templates for common use cases
  • Dynamic Variables: Jinja2 templating with type validation (roadmap)
  • Strategy Selection: Automatically choose best template based on context
  • Version Control: Track prompt changes and performance over time (roadmap)
  • A/B Testing: Compare prompt variations with built-in analytics (roadmap)
  • Chain-of-Thought: Built-in support for reasoning chains
  • Multi-Agent: Coordinate multiple specialized prompts (roadmap)

🔄 How Components Work Together

  1. User Request → Runtime receives and validates the request
  2. Context Retrieval → Data Pipeline searches relevant documents
  3. Prompt Selection → Prompts system chooses optimal template
  4. Model Execution → Models component handles LLM interaction with automatic failover
  5. Response Delivery → Runtime returns formatted response to user

Each component is independent but designed to work seamlessly together through standardized interfaces.


🚀 Quick Start

Installation

# Or clone and set up manually
git clone https://github.com/llama-farm/llamafarm.git
cd llamafarm

Full Dev Mode:

npm install -g nx  
nx init --useDotNxInstallation --interactive=false
nx start server

🎯 Getting Started

💡 Important: All our demos use the REAL CLI and REAL configuration system - what you see in the demos is exactly how you'll use LlamaFarm in production!

For the best experience getting started with LlamaFarm, we recommend exploring our component documentation and running the interactive demos:

📚 RAG System (Document Processing & Retrieval)

  • Read the RAG Documentation - Complete guide to document ingestion, embedding, and retrieval
  • Run the Interactive Demos:
    cd rag
    uv sync
    
    # Interactive setup wizard - guides you through configuration
    uv run python setup_demo.py
    
    # Or try specific demos with the real CLI:
    uv run python cli.py demo research_papers    # Academic paper analysis
    uv run python cli.py demo customer_support   # Support ticket processing
    uv run python cli.py demo code_analysis      # Source code understanding
    
    # Use your own documents:
    uv run python cli.py ingest ./your-docs/ --strategy research
    uv run python cli.py search "your query here" --top-k 5

🤖 Models (LLM Management & Optimization)

  • Read the Models Documentation - Multi-provider support, fallback strategies, and cost optimization
  • Run the Interactive Demos:
    cd models
    uv sync
    
    # Try our showcase demos:
    uv run python demos/demo1_cloud_fallback.py  # Automatic provider fallback
    uv run python demos/demo2_multi_model.py     # Smart model routing
    uv run python demos/demo3_training.py        # Fine-tuning pipeline (preview)
    
    # Or use the real CLI directly:
    uv run python cli.py chat --strategy balanced "Explain quantum computing"
    uv run python cli.py chat --primary gpt-4 --fallback claude-3 "Write a haiku"
    
    # Test with your own config:
    uv run python cli.py setup your-strategy.yaml --verify
    uv run python cli.py demo your-strategy

📝 Prompts (Coming Soon)

The prompts system is under active development. For now, explore the template system:

cd prompts
uv sync
uv run python -m prompts.cli template list  # View available templates
uv run python -m prompts.cli execute "Your task" --template research

🎮 Try It Live

RAG Pipeline Example

# Ingest documents with smart extraction
uv run python rag/cli.py ingest samples/ \
  --extractors keywords entities statistics \
  --strategy research

# Search with advanced retrieval
uv run python rag/cli.py search \
  "What are the key findings about climate change?" \
  --top-k 5 --rerank

Multi-Model Chat Example

# Chat with automatic fallback
uv run python models/cli.py chat \
  --primary gpt-4 \
  --fallback claude-3 \
  --local-fallback llama3.2 \
  "Explain quantum entanglement"

Smart Prompt Example

# Use domain-specific templates
uv run python prompts/cli.py execute \
  "Analyze this medical report for anomalies" \
  --strategy medical \
  --template diagnostic_analysis

🎯 Configuration System

LlamaFarm uses a strategy-based configuration system that adapts to your use case:

Strategy Configuration Example

# config/strategies.yaml
strategies:
  research:
    rag:
      embedder: "sentence-transformers"
      chunk_size: 512
      overlap: 50
      retrievers:
        - type: "hybrid"
          weights: {dense: 0.7, sparse: 0.3}
    models:
      primary: "gpt-4"
      fallback: "claude-3-opus"
      temperature: 0.3
    prompts:
      template: "academic_research"
      style: "formal"
      citations: true
  
  customer_support:
    rag:
      embedder: "openai"
      chunk_size: 256
      retrievers:
        - type: "similarity"
          top_k: 3
    models:
      primary: "gpt-3.5-turbo"
      temperature: 0.7
    prompts:
      template: "conversational"
      style: "friendly"
      include_context: true

Using Strategies

# Apply strategy across all components
export LLAMAFARM_STRATEGY=research

# Or specify per command
uv run python rag/cli.py ingest docs/ --strategy research
uv run python models/cli.py chat --strategy customer_support "Help me with my order"

📚 Documentation

📖 Comprehensive Guides

Component Description Documentation
RAG System Document processing, embedding, retrieval 📚 RAG Guide
Models LLM providers, management, optimization 🤖 Models Guide
Prompts Templates, strategies, evaluation 📝 Prompts Guide
CLI Command-line tools and utilities ⚡ CLI Reference
API REST API services 🔌 API Docs

🎓 Tutorials

🔧 Examples

Check out our examples/ directory for complete working applications:

  • 📚 Knowledge Base Assistant
  • 💬 Customer Support Bot
  • 📊 Document Analysis Pipeline
  • 🔍 Semantic Search Engine
  • 🤖 Multi-Agent System

🚢 Deployment Options

Local Development

# Run with hot-reload
uv run python main.py --dev

# Or use Docker
docker-compose up -d

Production Deployment

# docker-compose.prod.yml
version: '3.8'
services:
  llamafarm:
    image: llamafarm/llamafarm:latest
    environment:
      - STRATEGY=production
      - WORKERS=4
    volumes:
      - ./config:/app/config
      - ./data:/app/data
    ports:
      - "8000:8000"
    deploy:
      replicas: 3
      resources:
        limits:
          memory: 4G

Cloud Deployment

  • AWS: ECS, Lambda, SageMaker
  • GCP: Cloud Run, Vertex AI
  • Azure: Container Instances, ML Studio
  • Self-Hosted: Kubernetes, Docker Swarm

See deployment guide for detailed instructions.


🛠️ Advanced Features

🔄 Pipeline Composition

from llamafarm import Pipeline, RAG, Models, Prompts

# Create a complete AI pipeline
pipeline = Pipeline(strategy="research")
  .add(RAG.ingest("documents/"))
  .add(Prompts.select_template())
  .add(Models.generate())
  .add(RAG.store_results())

# Execute with monitoring
results = pipeline.run(
    query="What are the implications?",
    monitor=True,
    cache=True
)

🎯 Custom Strategies

from llamafarm.strategies import Strategy

class MedicalStrategy(Strategy):
    """Custom strategy for medical document analysis"""
    
    def configure_rag(self):
        return {
            "extractors": ["medical_entities", "dosages", "symptoms"],
            "embedder": "biobert",
            "chunk_size": 256
        }
    
    def configure_models(self):
        return {
            "primary": "med-palm-2",
            "temperature": 0.1,
            "require_citations": True
        }

📊 Monitoring & Analytics

from llamafarm.monitoring import Monitor

monitor = Monitor()
monitor.track_usage()
monitor.analyze_costs()
monitor.export_metrics("prometheus")

🌍 Community & Ecosystem

🤝 Contributing

We welcome contributions! See our Contributing Guide for:

  • 🐛 Reporting bugs
  • 💡 Suggesting features
  • 🔧 Submitting PRs
  • 📚 Improving docs

🏆 Contributors

Bobby Radford
Bobby Radford

💻
Matt Hamann
Matt Hamann

💻
Rob Thelen
Rob Thelen

💻
rachradulo
rachradulo

💻
Racheal Ochalek
Racheal Ochalek

💻
Davon Davis
Davon Davis

💻
github-actions[bot]
github-actions[bot]

💻

🔗 Integration Partners

  • Vector DBs: ChromaDB, Pinecone, Weaviate, Qdrant, FAISS
  • LLM Providers: OpenAI, Anthropic, Google, Cohere, Together, Groq
  • Deployment: Docker, Kubernetes, AWS, GCP, Azure
  • Monitoring: Prometheus, Grafana, DataDog, New Relic

🚦 Roadmap

✅ Released

  • RAG System with 10+ parsers and 5+ extractors
  • 25+ LLM provider integrations
  • 20+ prompt templates with strategies
  • CLI tools for all components
  • Docker deployment support

🚀 Coming Soon

  • Full Runtime System - Complete orchestration layer for managing all components with health monitoring, resource allocation, and automatic recovery
  • Production Deployer - Zero-configuration deployment from local development to cloud with automatic scaling and load balancing
  • Fine-tuning Pipeline - Train custom models on your data with integrated evaluation and deployment
  • Web UI Dashboard - Visual interface for monitoring, configuration, and management
  • Enhanced CLI - Unified command interface across all components

🚧 In Progress

  • Fine-tuning pipeline (Looking for contributors with ML experience)
  • Advanced caching system (Redis/Memcached integration - 40% complete)
  • GraphRAG implementation (Design phase - Join discussion)
  • Multi-modal support (Vision models integration - Early prototype)
  • Agent orchestration (LangGraph integration planned)

📅 Planned (late-2025)

  • AutoML for strategy optimization (Q4 2025 - Seeking ML engineers)
  • Distributed training (Q4 2025 - Partnership opportunities welcome)
  • Edge deployment (Q4 2025 - IoT and mobile focus)
  • Mobile SDKs (iOS/Android - Looking for mobile developers)
  • Web UI dashboard (Q4 2025 - React/Vue developers needed)

🤝 Want to Contribute?

We're actively looking for contributors in these areas:

  • 🧠 Machine Learning: Fine-tuning, distributed training
  • 📱 Mobile Development: iOS/Android SDKs
  • 🎨 Frontend: Web UI dashboard
  • 🔍 Search: GraphRAG and advanced retrieval
  • 📚 Documentation: Tutorials and examples

📄 License

LlamaFarm is MIT licensed. See LICENSE for details.


🙏 Acknowledgments

LlamaFarm stands on the shoulders of giants:

  • 🦜 LangChain - LLM orchestration inspiration
  • 🤗 Transformers - Model implementations
  • 🎯 ChromaDB - Vector database excellence
  • 🚀 uv - Lightning-fast package management

See CREDITS.md for complete acknowledgments.


🦙 Ready to Build Production AI?

Join thousands of developers building with LlamaFarm

⭐ Star on GitHub💬 Join Discord📚 Read Docs


Build locally. Deploy anywhere. Own your AI.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for llamafarm

Similar Open Source Tools

For similar tasks

For similar jobs