
llamafarm
Deploy any AI model, agents, database, RAG, and pipeline locally in minutes
Stars: 110

LlamaFarm is a comprehensive AI framework that empowers users to build powerful AI applications locally, with full control over costs and deployment options. It provides modular components for RAG systems, vector databases, model management, prompt engineering, and fine-tuning. Users can create differentiated AI products without needing extensive ML expertise, using simple CLI commands and YAML configs. The framework supports local-first development, production-ready components, strategy-based configuration, and deployment anywhere from laptops to the cloud.
README:

The Complete AI Development Framework - From Local Prototypes to Production Systems
🚀 Quick Start • 📚 Documentation • 🏗️ Architecture • 🤝 Contributing
🚧 Building in the Open: We're actively developing LlamaFarm and not everything is working yet. Join us as we build the future of local-first AI development! Check our roadmap to see what's coming and how you can contribute.
The AI revolution should be accessible to everyone, not just ML experts and big tech companies. We believe you shouldn't need a PhD to build powerful AI applications - just a CLI, your config files, and your data. Too many teams are stuck between expensive cloud APIs that lock you in, or complex open-source tools that require months of ML expertise to productionize. LlamaFarm changes this: full control and production-ready AI with simple commands and YAML configs. No machine learning degree required - if you can write config files and run CLI commands, you can build sophisticated AI systems. Build locally with your data, maintain complete control over costs, and deploy anywhere from your laptop to the cloud - all with the same straightforward interface.
LlamaFarm is a comprehensive, modular framework for building AI Projects that run locally, collaborate, and deploy anywhere. We provide battle-tested components for RAG systems, vector databases, model management, prompt engineering, and soon fine-tuning - all designed to work seamlessly together or independently.
We're not local-only zealots - use cloud APIs where they make sense for your needs - llamafarm helps with that! But we believe the real value in the AI economy comes from building something uniquely yours, not just wrapping another UI around GPT-5. True innovation happens when you can train on your proprietary data, fine-tune for your specific use cases, and maintain full control over your AI stack. LlamaFarm gives you the tools to create differentiated AI products that your competitors can't simply copy by calling the same API.
LlamaFarm is a comprehensive, modular AI framework that gives you complete control over your AI stack. Unlike cloud-only solutions, we provide:
- 🏠 Local-First Development - Build and test entirely on your machine
- 🔧 Production-Ready Components - Battle-tested modules that scale from laptop to cluster
- 🎯 Strategy/config-Based Configuration - Smart defaults with infinite customization
- 🚀 Deploy Anywhere - Same code runs locally, on-premise, or in any cloud
- Developers who want to build AI applications without vendor lock-in
- Teams needing cost control and data privacy
- Enterprises requiring scalable, secure AI infrastructure
- Researchers experimenting with cutting-edge techniques
LlamaFarm is built as a modular system where each component can be used independently or orchestrated together for powerful AI applications.
The execution environment that orchestrates all components and manages the application lifecycle.
- Process Management: Handles component initialization and shutdown
- API/Access Layer: Send queries to /chat, data to /data, and get full results with ease.
- Resource Allocation: Manages memory, CPU, and GPU resources efficiently
- Service Discovery: Automatically finds and connects components
- Health Monitoring: Tracks component status and performance metrics
- Error Recovery: Automatic restart and fallback mechanisms
Zero-configuration deployment system that works from local development to production clusters.
- Environment Detection: Automatically adapts to local, Docker, or cloud environments
- Configuration Management: Handles environment variables and secrets securely
- Scaling: Horizontal and vertical scaling based on load
- Load Balancing: Distributes requests across multiple instances
- Rolling Updates: Zero-downtime deployments with automatic rollback
Complete document processing and retrieval system for building knowledge-augmented applications.
- Document Ingestion: Parse 15+ formats (PDF, Word, Excel, HTML, Markdown, etc.)
- Smart Extraction: Extract entities, keywords, statistics without LLMs
- Vector Storage: Integration with 8+ vector databases (Chroma, Pinecone, FAISS, etc.)
- Hybrid Search: Combine semantic, keyword, and metadata-based retrieval
- Chunking Strategies: Adaptive chunking based on document type and use case
- Incremental Updates: Efficiently update knowledge base without full reprocessing
Unified interface for all LLM operations with enterprise-grade features.
- Multi-Provider Support: 25+ providers (OpenAI, Anthropic, Google, Ollama, etc.)
- Automatic Failover: Seamless fallback between providers when errors occur
- Fine-Tuning Pipeline: Train custom models on your data (Coming Q2 2025)
- Cost Optimization: Route queries to cheapest capable model
- Load Balancing: Distribute across multiple API keys and endpoints
- Response Caching: Intelligent caching to reduce API costs
- Model Configuration: Per-model temperature, token limits, and parameters
Enterprise prompt management system with version control and A/B testing.
- Template Library: 20+ pre-built templates for common use cases
- Dynamic Variables: Jinja2 templating with type validation (roadmap)
- Strategy Selection: Automatically choose best template based on context
- Version Control: Track prompt changes and performance over time (roadmap)
- A/B Testing: Compare prompt variations with built-in analytics (roadmap)
- Chain-of-Thought: Built-in support for reasoning chains
- Multi-Agent: Coordinate multiple specialized prompts (roadmap)
- User Request → Runtime receives and validates the request
- Context Retrieval → Data Pipeline searches relevant documents
- Prompt Selection → Prompts system chooses optimal template
- Model Execution → Models component handles LLM interaction with automatic failover
- Response Delivery → Runtime returns formatted response to user
Each component is independent but designed to work seamlessly together through standardized interfaces.
# Or clone and set up manually
git clone https://github.com/llama-farm/llamafarm.git
cd llamafarm
Full Dev Mode:
npm install -g nx
nx init --useDotNxInstallation --interactive=false
nx start server
💡 Important: All our demos use the REAL CLI and REAL configuration system - what you see in the demos is exactly how you'll use LlamaFarm in production!
For the best experience getting started with LlamaFarm, we recommend exploring our component documentation and running the interactive demos:
- Read the RAG Documentation - Complete guide to document ingestion, embedding, and retrieval
-
Run the Interactive Demos:
cd rag uv sync # Interactive setup wizard - guides you through configuration uv run python setup_demo.py # Or try specific demos with the real CLI: uv run python cli.py demo research_papers # Academic paper analysis uv run python cli.py demo customer_support # Support ticket processing uv run python cli.py demo code_analysis # Source code understanding # Use your own documents: uv run python cli.py ingest ./your-docs/ --strategy research uv run python cli.py search "your query here" --top-k 5
- Read the Models Documentation - Multi-provider support, fallback strategies, and cost optimization
-
Run the Interactive Demos:
cd models uv sync # Try our showcase demos: uv run python demos/demo1_cloud_fallback.py # Automatic provider fallback uv run python demos/demo2_multi_model.py # Smart model routing uv run python demos/demo3_training.py # Fine-tuning pipeline (preview) # Or use the real CLI directly: uv run python cli.py chat --strategy balanced "Explain quantum computing" uv run python cli.py chat --primary gpt-4 --fallback claude-3 "Write a haiku" # Test with your own config: uv run python cli.py setup your-strategy.yaml --verify uv run python cli.py demo your-strategy
The prompts system is under active development. For now, explore the template system:
cd prompts
uv sync
uv run python -m prompts.cli template list # View available templates
uv run python -m prompts.cli execute "Your task" --template research
# Ingest documents with smart extraction
uv run python rag/cli.py ingest samples/ \
--extractors keywords entities statistics \
--strategy research
# Search with advanced retrieval
uv run python rag/cli.py search \
"What are the key findings about climate change?" \
--top-k 5 --rerank
# Chat with automatic fallback
uv run python models/cli.py chat \
--primary gpt-4 \
--fallback claude-3 \
--local-fallback llama3.2 \
"Explain quantum entanglement"
# Use domain-specific templates
uv run python prompts/cli.py execute \
"Analyze this medical report for anomalies" \
--strategy medical \
--template diagnostic_analysis
LlamaFarm uses a strategy-based configuration system that adapts to your use case:
# config/strategies.yaml
strategies:
research:
rag:
embedder: "sentence-transformers"
chunk_size: 512
overlap: 50
retrievers:
- type: "hybrid"
weights: {dense: 0.7, sparse: 0.3}
models:
primary: "gpt-4"
fallback: "claude-3-opus"
temperature: 0.3
prompts:
template: "academic_research"
style: "formal"
citations: true
customer_support:
rag:
embedder: "openai"
chunk_size: 256
retrievers:
- type: "similarity"
top_k: 3
models:
primary: "gpt-3.5-turbo"
temperature: 0.7
prompts:
template: "conversational"
style: "friendly"
include_context: true
# Apply strategy across all components
export LLAMAFARM_STRATEGY=research
# Or specify per command
uv run python rag/cli.py ingest docs/ --strategy research
uv run python models/cli.py chat --strategy customer_support "Help me with my order"
Component | Description | Documentation |
---|---|---|
RAG System | Document processing, embedding, retrieval | 📚 RAG Guide |
Models | LLM providers, management, optimization | 🤖 Models Guide |
Prompts | Templates, strategies, evaluation | 📝 Prompts Guide |
CLI | Command-line tools and utilities | ⚡ CLI Reference |
API | REST API services | 🔌 API Docs |
- Building Your First RAG Application
- Setting Up Local Models with Ollama
- Advanced Prompt Engineering
- Deploying to Production
- Cost Optimization Strategies
Check out our examples/ directory for complete working applications:
- 📚 Knowledge Base Assistant
- 💬 Customer Support Bot
- 📊 Document Analysis Pipeline
- 🔍 Semantic Search Engine
- 🤖 Multi-Agent System
# Run with hot-reload
uv run python main.py --dev
# Or use Docker
docker-compose up -d
# docker-compose.prod.yml
version: '3.8'
services:
llamafarm:
image: llamafarm/llamafarm:latest
environment:
- STRATEGY=production
- WORKERS=4
volumes:
- ./config:/app/config
- ./data:/app/data
ports:
- "8000:8000"
deploy:
replicas: 3
resources:
limits:
memory: 4G
- AWS: ECS, Lambda, SageMaker
- GCP: Cloud Run, Vertex AI
- Azure: Container Instances, ML Studio
- Self-Hosted: Kubernetes, Docker Swarm
See deployment guide for detailed instructions.
from llamafarm import Pipeline, RAG, Models, Prompts
# Create a complete AI pipeline
pipeline = Pipeline(strategy="research")
.add(RAG.ingest("documents/"))
.add(Prompts.select_template())
.add(Models.generate())
.add(RAG.store_results())
# Execute with monitoring
results = pipeline.run(
query="What are the implications?",
monitor=True,
cache=True
)
from llamafarm.strategies import Strategy
class MedicalStrategy(Strategy):
"""Custom strategy for medical document analysis"""
def configure_rag(self):
return {
"extractors": ["medical_entities", "dosages", "symptoms"],
"embedder": "biobert",
"chunk_size": 256
}
def configure_models(self):
return {
"primary": "med-palm-2",
"temperature": 0.1,
"require_citations": True
}
from llamafarm.monitoring import Monitor
monitor = Monitor()
monitor.track_usage()
monitor.analyze_costs()
monitor.export_metrics("prometheus")
We welcome contributions! See our Contributing Guide for:
- 🐛 Reporting bugs
- 💡 Suggesting features
- 🔧 Submitting PRs
- 📚 Improving docs
Bobby Radford 💻 |
Matt Hamann 💻 |
Rob Thelen 💻 |
rachradulo 💻 |
Racheal Ochalek 💻 |
Davon Davis 💻 |
github-actions[bot] 💻 |
- Vector DBs: ChromaDB, Pinecone, Weaviate, Qdrant, FAISS
- LLM Providers: OpenAI, Anthropic, Google, Cohere, Together, Groq
- Deployment: Docker, Kubernetes, AWS, GCP, Azure
- Monitoring: Prometheus, Grafana, DataDog, New Relic
- RAG System with 10+ parsers and 5+ extractors
- 25+ LLM provider integrations
- 20+ prompt templates with strategies
- CLI tools for all components
- Docker deployment support
- Full Runtime System - Complete orchestration layer for managing all components with health monitoring, resource allocation, and automatic recovery
- Production Deployer - Zero-configuration deployment from local development to cloud with automatic scaling and load balancing
- Fine-tuning Pipeline - Train custom models on your data with integrated evaluation and deployment
- Web UI Dashboard - Visual interface for monitoring, configuration, and management
- Enhanced CLI - Unified command interface across all components
- Fine-tuning pipeline (Looking for contributors with ML experience)
- Advanced caching system (Redis/Memcached integration - 40% complete)
- GraphRAG implementation (Design phase - Join discussion)
- Multi-modal support (Vision models integration - Early prototype)
- Agent orchestration (LangGraph integration planned)
- AutoML for strategy optimization (Q4 2025 - Seeking ML engineers)
- Distributed training (Q4 2025 - Partnership opportunities welcome)
- Edge deployment (Q4 2025 - IoT and mobile focus)
- Mobile SDKs (iOS/Android - Looking for mobile developers)
- Web UI dashboard (Q4 2025 - React/Vue developers needed)
We're actively looking for contributors in these areas:
- 🧠 Machine Learning: Fine-tuning, distributed training
- 📱 Mobile Development: iOS/Android SDKs
- 🎨 Frontend: Web UI dashboard
- 🔍 Search: GraphRAG and advanced retrieval
- 📚 Documentation: Tutorials and examples
LlamaFarm is MIT licensed. See LICENSE for details.
LlamaFarm stands on the shoulders of giants:
- 🦜 LangChain - LLM orchestration inspiration
- 🤗 Transformers - Model implementations
- 🎯 ChromaDB - Vector database excellence
- 🚀 uv - Lightning-fast package management
See CREDITS.md for complete acknowledgments.
Join thousands of developers building with LlamaFarm
⭐ Star on GitHub • 💬 Join Discord • 📚 Read Docs •
Build locally. Deploy anywhere. Own your AI.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for llamafarm
Similar Open Source Tools

llamafarm
LlamaFarm is a comprehensive AI framework that empowers users to build powerful AI applications locally, with full control over costs and deployment options. It provides modular components for RAG systems, vector databases, model management, prompt engineering, and fine-tuning. Users can create differentiated AI products without needing extensive ML expertise, using simple CLI commands and YAML configs. The framework supports local-first development, production-ready components, strategy-based configuration, and deployment anywhere from laptops to the cloud.

bifrost
Bifrost is a high-performance AI gateway that unifies access to multiple providers through a single OpenAI-compatible API. It offers features like automatic failover, load balancing, semantic caching, and enterprise-grade functionalities. Users can deploy Bifrost in seconds with zero configuration, benefiting from its core infrastructure, advanced features, enterprise and security capabilities, and developer experience. The repository structure is modular, allowing for maximum flexibility. Bifrost is designed for quick setup, easy configuration, and seamless integration with various AI models and tools.

lyraios
LYRAIOS (LLM-based Your Reliable AI Operating System) is an advanced AI assistant platform built with FastAPI and Streamlit, designed to serve as an operating system for AI applications. It offers core features such as AI process management, memory system, and I/O system. The platform includes built-in tools like Calculator, Web Search, Financial Analysis, File Management, and Research Tools. It also provides specialized assistant teams for Python and research tasks. LYRAIOS is built on a technical architecture comprising FastAPI backend, Streamlit frontend, Vector Database, PostgreSQL storage, and Docker support. It offers features like knowledge management, process control, and security & access control. The roadmap includes enhancements in core platform, AI process management, memory system, tools & integrations, security & access control, open protocol architecture, multi-agent collaboration, and cross-platform support.

opcode
opcode is a powerful desktop application built with Tauri 2 that serves as a command center for interacting with Claude Code. It offers a visual GUI for managing Claude Code sessions, creating custom agents, tracking usage, and more. Users can navigate projects, create specialized AI agents, monitor usage analytics, manage MCP servers, create session checkpoints, edit CLAUDE.md files, and more. The tool bridges the gap between command-line tools and visual experiences, making AI-assisted development more intuitive and productive.

gemini-cli
Gemini CLI is an open-source AI agent that provides lightweight access to Gemini, offering powerful capabilities like code understanding, generation, automation, integration, and advanced features. It is designed for developers who prefer working in the command line and offers extensibility through MCP support. The tool integrates directly into GitHub workflows and offers various authentication options for individual developers, enterprise teams, and production workloads. With features like code querying, editing, app generation, debugging, and GitHub integration, Gemini CLI aims to streamline development workflows and enhance productivity.

agentneo
AgentNeo is a Python package that provides functionalities for project, trace, dataset, experiment management. It allows users to authenticate, create projects, trace agents and LangGraph graphs, manage datasets, and run experiments with metrics. The tool aims to streamline AI project management and analysis by offering a comprehensive set of features.

paelladoc
PAELLADOC is an intelligent documentation system that uses AI to analyze code repositories and generate comprehensive technical documentation. It offers a modular architecture with MECE principles, interactive documentation process, key features like Orchestrator and Commands, and a focus on context for successful AI programming. The tool aims to streamline documentation creation, code generation, and product management tasks for software development teams, providing a definitive standard for AI-assisted development documentation.

AgentNeo
AgentNeo is an advanced, open-source Agentic AI Application Observability, Monitoring, and Evaluation Framework designed to provide deep insights into AI agents, Large Language Model (LLM) calls, and tool interactions. It offers robust logging, visualization, and evaluation capabilities to help debug and optimize AI applications with ease. With features like tracing LLM calls, monitoring agents and tools, tracking interactions, detailed metrics collection, flexible data storage, simple instrumentation, interactive dashboard, project management, execution graph visualization, and evaluation tools, AgentNeo empowers users to build efficient, cost-effective, and high-quality AI-driven solutions.

presenton
Presenton is an open-source AI presentation generator and API that allows users to create professional presentations locally on their devices. It offers complete control over the presentation workflow, including custom templates, AI template generation, flexible generation options, and export capabilities. Users can use their own API keys for various models, integrate with Ollama for local model running, and connect to OpenAI-compatible endpoints. The tool supports multiple providers for text and image generation, runs locally without cloud dependencies, and can be deployed as a Docker container with GPU support.

ComfyUI-Ollama-Describer
ComfyUI-Ollama-Describer is an extension for ComfyUI that enables the use of LLM models provided by Ollama, such as Gemma, Llava (multimodal), Llama2, Llama3, or Mistral. It requires the Ollama library for interacting with large-scale language models, supporting GPUs using CUDA and AMD GPUs on Windows, Linux, and Mac. The extension allows users to run Ollama through Docker and utilize NVIDIA GPUs for faster processing. It provides nodes for image description, text description, image captioning, and text transformation, with various customizable parameters for model selection, API communication, response generation, and model memory management.

J.A.R.V.I.S.2.0
J.A.R.V.I.S. 2.0 is an AI-powered assistant designed for voice commands, capable of tasks like providing weather reports, summarizing news, sending emails, and more. It features voice activation, speech recognition, AI responses, and handles multiple tasks including email sending, weather reports, news reading, image generation, database functions, phone call automation, AI-based task execution, website & application automation, and knowledge-based interactions. The assistant also includes timeout handling, automatic input processing, and the ability to call multiple functions simultaneously. It requires Python 3.9 or later and specific API keys for weather, news, email, and AI access. The tool integrates Gemini AI for function execution and Ollama as a fallback mechanism. It utilizes a RAG-based knowledge system and ADB integration for phone automation. Future enhancements include deeper mobile integration, advanced AI-driven automation, improved NLP-based command execution, and multi-modal interactions.

DeepSeekAI
DeepSeekAI is a browser extension plugin that allows users to interact with AI by selecting text on web pages and invoking the DeepSeek large model to provide AI responses. The extension enhances browsing experience by enabling users to get summaries or answers for selected text directly on the webpage. It features context text selection, API key integration, draggable and resizable window, AI streaming replies, Markdown rendering, one-click copy, re-answer option, code copy functionality, language switching, and multi-turn dialogue support. Users can install the extension from Chrome Web Store or Edge Add-ons, or manually clone the repository, install dependencies, and build the extension. Configuration involves entering the DeepSeek API key in the extension popup window to start using the AI-driven responses.

Hacx-GPT
Hacx GPT is a cutting-edge AI tool developed by BlackTechX, inspired by WormGPT, designed to push the boundaries of natural language processing. It is an advanced broken AI model that facilitates seamless and powerful interactions, allowing users to ask questions and perform various tasks. The tool has been rigorously tested on platforms like Kali Linux, Termux, and Ubuntu, offering powerful AI conversations and the ability to do anything the user wants. Users can easily install and run Hacx GPT on their preferred platform to explore its vast capabilities.

swift-ocr-llm-powered-pdf-to-markdown
Swift OCR is a powerful tool for extracting text from PDF files using OpenAI's GPT-4 Turbo with Vision model. It offers flexible input options, advanced OCR processing, performance optimizations, structured output, robust error handling, and scalable architecture. The tool ensures accurate text extraction, resilience against failures, and efficient handling of multiple requests.

R2R
R2R (RAG to Riches) is a fast and efficient framework for serving high-quality Retrieval-Augmented Generation (RAG) to end users. The framework is designed with customizable pipelines and a feature-rich FastAPI implementation, enabling developers to quickly deploy and scale RAG-based applications. R2R was conceived to bridge the gap between local LLM experimentation and scalable production solutions. **R2R is to LangChain/LlamaIndex what NextJS is to React**. A JavaScript client for R2R deployments can be found here. ### Key Features * **🚀 Deploy** : Instantly launch production-ready RAG pipelines with streaming capabilities. * **🧩 Customize** : Tailor your pipeline with intuitive configuration files. * **🔌 Extend** : Enhance your pipeline with custom code integrations. * **⚖️ Autoscale** : Scale your pipeline effortlessly in the cloud using SciPhi. * **🤖 OSS** : Benefit from a framework developed by the open-source community, designed to simplify RAG deployment.

Archon
Archon is an AI meta-agent designed to autonomously build, refine, and optimize other AI agents. It serves as a practical tool for developers and an educational framework showcasing the evolution of agentic systems. Through iterative development, Archon demonstrates the power of planning, feedback loops, and domain-specific knowledge in creating robust AI agents.
For similar tasks

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

AI-in-a-Box
AI-in-a-Box is a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction, while maintaining the highest standards of quality and efficiency. It provides essential guidance on the responsible use of AI and LLM technologies, specific security guidance for Generative AI (GenAI) applications, and best practices for scaling OpenAI applications within Azure. The available accelerators include: Azure ML Operationalization in-a-box, Edge AI in-a-box, Doc Intelligence in-a-box, Image and Video Analysis in-a-box, Cognitive Services Landing Zone in-a-box, Semantic Kernel Bot in-a-box, NLP to SQL in-a-box, Assistants API in-a-box, and Assistants API Bot in-a-box.

spring-ai
The Spring AI project provides a Spring-friendly API and abstractions for developing AI applications. It offers a portable client API for interacting with generative AI models, enabling developers to easily swap out implementations and access various models like OpenAI, Azure OpenAI, and HuggingFace. Spring AI also supports prompt engineering, providing classes and interfaces for creating and parsing prompts, as well as incorporating proprietary data into generative AI without retraining the model. This is achieved through Retrieval Augmented Generation (RAG), which involves extracting, transforming, and loading data into a vector database for use by AI models. Spring AI's VectorStore abstraction allows for seamless transitions between different vector database implementations.

ragstack-ai
RAGStack is an out-of-the-box solution simplifying Retrieval Augmented Generation (RAG) in GenAI apps. RAGStack includes the best open-source for implementing RAG, giving developers a comprehensive Gen AI Stack leveraging LangChain, CassIO, and more. RAGStack leverages the LangChain ecosystem and is fully compatible with LangSmith for monitoring your AI deployments.

breadboard
Breadboard is a library for prototyping generative AI applications. It is inspired by the hardware maker community and their boundless creativity. Breadboard makes it easy to wire prototypes and share, remix, reuse, and compose them. The library emphasizes ease and flexibility of wiring, as well as modularity and composability.

cloudflare-ai-web
Cloudflare-ai-web is a lightweight and easy-to-use tool that allows you to quickly deploy a multi-modal AI platform using Cloudflare Workers AI. It supports serverless deployment, password protection, and local storage of chat logs. With a size of only ~638 kB gzip, it is a great option for building AI-powered applications without the need for a dedicated server.

app-builder
AppBuilder SDK is a one-stop development tool for AI native applications, providing basic cloud resources, AI capability engine, Qianfan large model, and related capability components to improve the development efficiency of AI native applications.

cookbook
This repository contains community-driven practical examples of building AI applications and solving various tasks with AI using open-source tools and models. Everyone is welcome to contribute, and we value everybody's contribution! There are several ways you can contribute to the Open-Source AI Cookbook: Submit an idea for a desired example/guide via GitHub Issues. Contribute a new notebook with a practical example. Improve existing examples by fixing issues/typos. Before contributing, check currently open issues and pull requests to avoid working on something that someone else is already working on.
For similar jobs

promptflow
**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

deepeval
DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

MegaDetector
MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".

leapfrogai
LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.

llava-docker
This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

carrot
The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.

TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

AI-YinMei
AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.