rlama

A powerful document AI question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems for all your document needs.

Stars: 905

Visit

RLAMA is a powerful AI-driven question-answering tool that seamlessly integrates with local Ollama models. It enables users to create, manage, and interact with Retrieval-Augmented Generation (RAG) systems tailored to their documentation needs. RLAMA follows a clean architecture pattern with clear separation of concerns, focusing on lightweight and portable RAG capabilities with minimal dependencies. The tool processes documents, generates embeddings, stores RAG systems locally, and provides contextually-informed responses to user queries. Supported document formats include text, code, and various document types, with troubleshooting steps available for common issues like Ollama accessibility, text extraction problems, and relevance of answers.

README:

RLAMA - User Guide

RLAMA is a powerful AI-driven question-answering tool for your documents, seamlessly integrating with your local Ollama models. It enables you to create, manage, and interact with Retrieval-Augmented Generation (RAG) systems tailored to your documentation needs.

Vision & Roadmap
Installation
Available Commands
Uninstallation
Supported Document Formats
Troubleshooting
Using OpenAI Models

Vision & Roadmap

RLAMA aims to become the definitive tool for creating local RAG systems that work seamlessly for everyone—from individual developers to large enterprises. Here's our strategic roadmap:

Completed Features ✅

✅ Basic RAG System Creation: CLI tool for creating and managing RAG systems
✅ Document Processing: Support for multiple document formats (.txt, .md, .pdf, etc.)
✅ Document Chunking: Advanced semantic chunking with multiple strategies (fixed, semantic, hierarchical, hybrid)
✅ Vector Storage: Local storage of document embeddings
✅ Context Retrieval: Basic semantic search with configurable context size
✅ Ollama Integration: Seamless connection to Ollama models
✅ Cross-Platform Support: Works on Linux, macOS, and Windows
✅ Easy Installation: One-line installation script
✅ API Server: HTTP endpoints for integrating RAG capabilities in other applications
✅ Web Crawling: Create RAGs directly from websites
✅ Guided RAG Setup Wizard: Interactive interface for easy RAG creation
✅ Hugging Face Integration: Access to 45,000+ GGUF models from Hugging Face Hub

Small LLM Optimization (Q2 2025)

[ ] Prompt Compression: Smart context summarization for limited context windows
✅ Adaptive Chunking: Dynamic content segmentation based on semantic boundaries and document structure
✅ Minimal Context Retrieval: Intelligent filtering to eliminate redundant content
[ ] Parameter Optimization: Fine-tuned settings for different model sizes

Advanced Embedding Pipeline (Q2-Q3 2025)

[ ] Multi-Model Embedding Support: Integration with various embedding models
[ ] Hybrid Retrieval Techniques: Combining sparse and dense retrievers for better accuracy
[ ] Embedding Evaluation Tools: Built-in metrics to measure retrieval quality
[ ] Automated Embedding Cache: Smart caching to reduce computation for similar queries

User Experience Enhancements (Q3 2025)

[ ] Lightweight Web Interface: Simple browser-based UI for the existing CLI backend
[ ] Knowledge Graph Visualization: Interactive exploration of document connections
[ ] Domain-Specific Templates: Pre-configured settings for different domains

Enterprise Features (Q4 2025)

[ ] Multi-User Access Control: Role-based permissions for team environments
[ ] Integration with Enterprise Systems: Connectors for SharePoint, Confluence, Google Workspace
[ ] Knowledge Quality Monitoring: Detection of outdated or contradictory information
[ ] System Integration API: Webhooks and APIs for embedding RLAMA in existing workflows
[ ] AI Agent Creation Framework: Simplified system for building custom AI agents with RAG capabilities

Next-Gen Retrieval Innovations (Q1 2026)

[ ] Multi-Step Retrieval: Using the LLM to refine search queries for complex questions
[ ] Cross-Modal Retrieval: Support for image content understanding and retrieval
[ ] Feedback-Based Optimization: Learning from user interactions to improve retrieval
[ ] Knowledge Graphs & Symbolic Reasoning: Combining vector search with structured knowledge

RLAMA's core philosophy remains unchanged: to provide a simple, powerful, local RAG solution that respects privacy, minimizes resource requirements, and works seamlessly across platforms.

Installation

Prerequisites

Ollama installed and running

Installation from terminal

curl -fsSL https://raw.githubusercontent.com/dontizi/rlama/main/install.sh | sh

Tech Stack

RLAMA is built with:

Core Language: Go (chosen for performance, cross-platform compatibility, and single binary distribution)
CLI Framework: Cobra (for command-line interface structure)
LLM Integration: Ollama API (for embeddings and completions)
Storage: Local filesystem-based storage (JSON files for simplicity and portability)
Vector Search: Custom implementation of cosine similarity for embedding retrieval

Architecture

RLAMA follows a clean architecture pattern with clear separation of concerns:

rlama/
├── cmd/                  # CLI commands (using Cobra)
│   ├── root.go           # Base command
│   ├── rag.go            # Create RAG systems
│   ├── run.go            # Query RAG systems
│   └── ...
├── internal/
│   ├── client/           # External API clients
│   │   └── ollama_client.go # Ollama API integration
│   ├── domain/           # Core domain models
│   │   ├── rag.go        # RAG system entity
│   │   └── document.go   # Document entity
│   ├── repository/       # Data persistence
│   │   └── rag_repository.go # Handles saving/loading RAGs
│   └── service/          # Business logic
│       ├── rag_service.go      # RAG operations
│       ├── document_loader.go  # Document processing
│       └── embedding_service.go # Vector embeddings
└── pkg/                  # Shared utilities
    └── vector/           # Vector operations

Data Flow

Document Processing: Documents are loaded from the file system, parsed based on their type, and converted to plain text.
Embedding Generation: Document text is sent to Ollama to generate vector embeddings.
Storage: The RAG system (documents + embeddings) is stored in the user's home directory (~/.rlama).
Query Process: When a user asks a question, it's converted to an embedding, compared against stored document embeddings, and relevant content is retrieved.
Response Generation: Retrieved content and the question are sent to Ollama to generate a contextually-informed response.

Visual Representation

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Documents  │────>│  Document   │────>│  Embedding  │
│  (Input)    │     │  Processing │     │  Generation │
└─────────────┘     └─────────────┘     └─────────────┘
                                              │
                                              ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Query     │────>│  Vector     │<────│ Vector Store│
│  Response   │     │  Search     │     │ (RAG System)│
└─────────────┘     └─────────────┘     └─────────────┘
       ▲                   │
       │                   ▼
┌─────────────┐     ┌─────────────┐
│   Ollama    │<────│   Context   │
│    LLM      │     │  Building   │
└─────────────┘     └─────────────┘

RLAMA is designed to be lightweight and portable, focusing on providing RAG capabilities with minimal dependencies. The entire system runs locally, with the only external dependency being Ollama for LLM capabilities.

Available Commands

You can get help on all commands by using:

rlama --help

Global Flags

These flags can be used with any command:

--host string   Ollama host (default: localhost)
--port string   Ollama port (default: 11434)

Custom Data Directory

RLAMA stores data in ~/.rlama by default. To use a different location:

Command-line flag (highest priority):

# Use with any command
rlama --data-dir /path/to/custom/directory run my-rag

Environment variable:

# Set the environment variable
export RLAMA_DATA_DIR=/path/to/custom/directory
rlama run my-rag

The precedence order is: command-line flag > environment variable > default location.

rag - Create a RAG system

Creates a new RAG system by indexing all documents in the specified folder.

rlama rag [model] [rag-name] [folder-path]

Parameters:

model: Name of the Ollama model to use (e.g., llama3, mistral, gemma) or a Hugging Face model using the format hf.co/username/repository[:quantization].
rag-name: Unique name to identify your RAG system.
folder-path: Path to the folder containing your documents.

Example:

# Using a standard Ollama model
rlama rag llama3 documentation ./docs

# Using a Hugging Face model
rlama rag hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF my-rag ./docs

# Using a Hugging Face model with specific quantization
rlama rag hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF:Q5_K_M my-rag ./docs

crawl-rag - Create a RAG system from a website

Creates a new RAG system by crawling a website and indexing its content.

rlama crawl-rag [model] [rag-name] [website-url]

Parameters:

model: Name of the Ollama model to use (e.g., llama3, mistral, gemma).
rag-name: Unique name to identify your RAG system.
website-url: URL of the website to crawl and index.

Options:

--max-depth: Maximum crawl depth (default: 2)
--concurrency: Number of concurrent crawlers (default: 5)
--exclude-path: Paths to exclude from crawling (comma-separated)
--chunk-size: Character count per chunk (default: 1000)
--chunk-overlap: Overlap between chunks in characters (default: 200)
--chunking-strategy: Chunking strategy to use (options: "fixed", "semantic", "hybrid", "hierarchical", default: "hybrid")

Chunking Strategies

RLAMA offers multiple advanced chunking strategies to optimize document retrieval:

Fixed: Traditional chunking with fixed size and overlap, respecting sentence boundaries when possible.
Semantic: Intelligently splits documents based on semantic boundaries like headings, paragraphs, and natural topic shifts.
Hybrid: Automatically selects the best strategy based on document type and content (markdown, HTML, code, or plain text).
Hierarchical: For very long documents, creates a two-level chunking structure with major sections and sub-chunks.

The system automatically adapts to different document types:

Markdown documents: Split by headers and sections
HTML documents: Split by semantic HTML elements
Code documents: Split by functions, classes, and logical blocks
Plain text: Split by paragraphs with contextual overlap

Example:

# Create a new RAG from a documentation website
rlama crawl-rag llama3 docs-rag https://docs.example.com

# Customize crawling behavior
rlama crawl-rag llama3 blog-rag https://blog.example.com --max-depth=3 --exclude-path=/archive,/tags

# Create a RAG with semantic chunking
rlama rag llama3 documentation ./docs --chunking-strategy=semantic

# Use hierarchical chunking for large documents
rlama rag llama3 book-rag ./books --chunking-strategy=hierarchical

wizard - Create a RAG system with interactive setup

Provides an interactive step-by-step wizard for creating a new RAG system.

rlama wizard

The wizard guides you through:

Naming your RAG
Choosing an Ollama model
Selecting document sources (local folder or website)
Configuring chunking parameters
Setting up file filtering

Example:

rlama wizard
# Follow the prompts to create your customized RAG

watch - Set up directory watching for a RAG system

Configure a RAG system to automatically watch a directory for new files and add them to the RAG.

rlama watch [rag-name] [directory-path] [interval]

Parameters:

rag-name: Name of the RAG system to watch.
directory-path: Path to the directory to watch for new files.
interval: Time in minutes to check for new files (use 0 to check only when the RAG is used).

Example:

# Set up directory watching to check every 60 minutes
rlama watch my-docs ./watched-folder 60

# Set up directory watching to only check when the RAG is used
rlama watch my-docs ./watched-folder 0

# Customize what files to watch
rlama watch my-docs ./watched-folder 30 --exclude-dir=node_modules,tmp --process-ext=.md,.txt

watch-off - Disable directory watching for a RAG system

Disable automatic directory watching for a RAG system.

rlama watch-off [rag-name]

Parameters:

rag-name: Name of the RAG system to disable watching.

Example:

rlama watch-off my-docs

check-watched - Check a RAG's watched directory for new files

Manually check a RAG's watched directory for new files and add them to the RAG.

rlama check-watched [rag-name]

Parameters:

rag-name: Name of the RAG system to check.

Example:

rlama check-watched my-docs

web-watch - Set up website monitoring for a RAG system

Configure a RAG system to automatically monitor a website for updates and add new content to the RAG.

rlama web-watch [rag-name] [website-url] [interval]

Parameters:

rag-name: Name of the RAG system to monitor.
website-url: URL of the website to monitor.
interval: Time in minutes between checks (use 0 to check only when the RAG is used).

Example:

# Set up website monitoring to check every 60 minutes
rlama web-watch my-docs https://example.com 60

# Set up website monitoring to only check when the RAG is used
rlama web-watch my-docs https://example.com 0

# Customize what content to monitor
rlama web-watch my-docs https://example.com 30 --exclude-path=/archive,/tags

web-watch-off - Disable website monitoring for a RAG system

Disable automatic website monitoring for a RAG system.

rlama web-watch-off [rag-name]

Parameters:

rag-name: Name of the RAG system to disable monitoring.

Example:

rlama web-watch-off my-docs

check-web-watched - Check a RAG's monitored website for updates

Manually check a RAG's monitored website for new updates and add them to the RAG.

rlama check-web-watched [rag-name]

Parameters:

rag-name: Name of the RAG system to check.

Example:

rlama check-web-watched my-docs

run - Use a RAG system

Starts an interactive session to interact with an existing RAG system.

rlama run [rag-name]

Parameters:

rag-name: Name of the RAG system to use.
--context-size: (Optional) Number of context chunks to retrieve (default: 20)

Example:

rlama run documentation
> How do I install the project?
> What are the main features?
> exit

Context Size Tips:

Smaller values (5-15) for faster responses with key information
Medium values (20-40) for balanced performance
Larger values (50+) for complex questions needing broad context
Consider your model's context window limits

rlama run documentation --context-size=50  # Use 50 context chunks

api - Start API server

Starts an HTTP API server that exposes RLAMA's functionality through RESTful endpoints.

rlama api [--port PORT]

Parameters:

--port: (Optional) Port number to run the API server on (default: 11249)

Example:

rlama api --port 8080

Available Endpoints:

Query a RAG system - POST /rag
```
curl -X POST http://localhost:11249/rag \
  -H "Content-Type: application/json" \
  -d '{
    "rag_name": "documentation",
    "prompt": "How do I install the project?",
    "context_size": 20
  }'
```
Request fields:
- rag_name (required): Name of the RAG system to query
- prompt (required): Question or prompt to send to the RAG
- context_size (optional): Number of chunks to include in context
- model (optional): Override the model used by the RAG
Check server health - GET /health
```
curl http://localhost:11249/health
```

Integration Example:

// Node.js example
const response = await fetch('http://localhost:11249/rag', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    rag_name: 'my-docs',
    prompt: 'Summarize the key features'
  })
});
const data = await response.json();
console.log(data.response);

list - List RAG systems

Displays a list of all available RAG systems.

rlama list

delete - Delete a RAG system

Permanently deletes a RAG system and all its indexed documents.

rlama delete [rag-name] [--force/-f]

Parameters:

rag-name: Name of the RAG system to delete.
--force or -f: (Optional) Delete without asking for confirmation.

Example:

rlama delete old-project

Or to delete without confirmation:

rlama delete old-project --force

list-docs - List documents in a RAG

Displays all documents in a RAG system with metadata.

rlama list-docs [rag-name]

Parameters:

rag-name: Name of the RAG system

Example:

rlama list-docs documentation

list-chunks - Inspect document chunks

List and filter document chunks in a RAG system with various options:

# Basic chunk listing
rlama list-chunks [rag-name]

# With content preview (shows first 100 characters)
rlama list-chunks [rag-name] --show-content

# Filter by document name/ID substring
rlama list-chunks [rag-name] --document=readme

# Combine options
rlama list-chunks [rag-name] --document=api --show-content

Options:

--show-content: Display chunk content preview
--document: Filter by document name/ID substring

Output columns:

Chunk ID (use with view-chunk command)
Document Source
Chunk Position (e.g., "2/5" for second of five chunks)
Content Preview (if enabled)
Created Date

view-chunk - View chunk details

Display detailed information about a specific chunk.

rlama view-chunk [rag-name] [chunk-id]

Parameters:

rag-name: Name of the RAG system
chunk-id: Chunk identifier from list-chunks

Example:

rlama view-chunk documentation doc123_chunk_0

add-docs - Add documents to RAG

Add new documents to an existing RAG system.

rlama add-docs [rag-name] [folder-path] [flags]

Parameters:

rag-name: Name of the RAG system
folder-path: Path to documents folder

Example:

rlama add-docs documentation ./new-docs --exclude-ext=.tmp

crawl-add-docs - Add website content to RAG

Add content from a website to an existing RAG system.

rlama crawl-add-docs [rag-name] [website-url]

Parameters:

rag-name: Name of the RAG system
website-url: URL of the website to crawl and add to the RAG

Options:

--max-depth: Maximum crawl depth (default: 2)
--concurrency: Number of concurrent crawlers (default: 5)
--exclude-path: Paths to exclude from crawling (comma-separated)
--chunk-size: Character count per chunk (default: 1000)
--chunk-overlap: Overlap between chunks in characters (default: 200)

Example:

# Add blog content to an existing RAG
rlama crawl-add-docs my-docs https://blog.example.com

# Customize crawling behavior
rlama crawl-add-docs knowledge-base https://docs.example.com --max-depth=1 --exclude-path=/api

update-model - Change LLM model

Update the LLM model used by a RAG system.

rlama update-model [rag-name] [new-model]

Parameters:

rag-name: Name of the RAG system
new-model: New Ollama model name

Example:

rlama update-model documentation deepseek-r1:7b-instruct

update - Update RLAMA

Checks if a new version of RLAMA is available and installs it.

rlama update [--force/-f]

Options:

--force or -f: (Optional) Update without asking for confirmation.

version - Display version

Displays the current version of RLAMA.

rlama --version

rlama -v

hf-browse - Browse GGUF models on Hugging Face

Search and browse GGUF models available on Hugging Face.

rlama hf-browse [search-term] [flags]

Parameters:

search-term: (Optional) Term to search for (e.g., "llama3", "mistral")

Flags:

--open: Open the search results in your default web browser
--quant: Specify quantization type to suggest (e.g., Q4_K_M, Q5_K_M)
--limit: Limit number of results (default: 10)

Examples:

# Search for GGUF models and show command-line help
rlama hf-browse "llama 3"

# Open browser with search results
rlama hf-browse mistral --open

# Search with specific quantization suggestion
rlama hf-browse phi --quant Q4_K_M

run-hf - Run a Hugging Face GGUF model

Run a Hugging Face GGUF model directly using Ollama. This is useful for testing models before creating a RAG system with them.

rlama run-hf [huggingface-model] [flags]

Parameters:

huggingface-model: Hugging Face model path in the format username/repository

Flags:

--quant: Quantization to use (e.g., Q4_K_M, Q5_K_M)

Examples:

# Try a model in chat mode
rlama run-hf bartowski/Llama-3.2-1B-Instruct-GGUF

# Specify quantization
rlama run-hf mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF --quant Q5_K_M

Uninstallation

To uninstall RLAMA:

Removing the binary

If you installed via go install:

rlama uninstall

Removing data

RLAMA stores its data in ~/.rlama. To remove it:

rm -rf ~/.rlama

Supported Document Formats

RLAMA supports many file formats:

Text: .txt, .md, .html, .json, .csv, .yaml, .yml, .xml, .org
Code: .go, .py, .js, .java, .c, .cpp, .cxx, .h, .rb, .php, .rs, .swift, .kt, .ts, .tsx, .f, .F, .F90, .el, .svelte
Documents: .pdf, .docx, .doc, .rtf, .odt, .pptx, .ppt, .xlsx, .xls, .epub

Installing dependencies via install_deps.sh is recommended to improve support for certain formats.

Troubleshooting

Ollama is not accessible

If you encounter connection errors to Ollama:

Check that Ollama is running.
By default, Ollama must be accessible at http://localhost:11434 or the host and port specified by the OLLAMA_HOST environment variable.

If your Ollama instance is running on a different host or port, use the --host and --port flags:

rlama --host 192.168.1.100 --port 8000 list
rlama --host my-ollama-server --port 11434 run my-rag

Check Ollama logs for potential errors.

Text extraction issues

If you encounter problems with certain formats:

Install dependencies via ./scripts/install_deps.sh.
Verify that your system has the required tools (pdftotext, tesseract, etc.).

The RAG doesn't find relevant information

If the answers are not relevant:

Check that the documents are properly indexed with rlama list.
Make sure the content of the documents is properly extracted.
Try rephrasing your question more precisely.
Consider adjusting chunking parameters during RAG creation

Other issues

For any other issues, please open an issue on the GitHub repository providing:

The exact command used.
The complete output of the command.
Your operating system and architecture.
The RLAMA version (rlama --version).

Configuring Ollama Connection

RLAMA provides multiple ways to connect to your Ollama instance:

Command-line flags (highest priority):

rlama --host 192.168.1.100 --port 8080 run my-rag

Environment variable:

# Format: "host:port" or just "host"
export OLLAMA_HOST=remote-server:8080
rlama run my-rag

Default values (used if no other method is specified):
- Host: localhost
- Port: 11434

The precedence order is: command-line flags > environment variable > default values.

Advanced Usage

Context Size Management

# Quick answers with minimal context
rlama run my-docs --context-size=10

# Deep analysis with maximum context
rlama run my-docs --context-size=50

# Balance between speed and depth
rlama run my-docs --context-size=30

RAG Creation with Filtering

rlama rag llama3 my-project ./code \
  --exclude-dir=node_modules,dist \
  --process-ext=.go,.ts \
  --exclude-ext=.spec.ts

Chunk Inspection

# List chunks with content preview
rlama list-chunks my-project --show-content

# Filter chunks from specific document
rlama list-chunks my-project --document=architecture

Help System

Get full command help:

rlama --help

Command-specific help:

rlama rag --help
rlama list-chunks --help
rlama update-model --help

All commands support the global --host and --port flags for custom Ollama connections.

The precedence order is: command-line flags > environment variable > default values.

Hugging Face Integration

RLAMA now supports using GGUF models directly from Hugging Face through Ollama's native integration:

Browsing Hugging Face Models

# Search for GGUF models on Hugging Face
rlama hf-browse "llama 3"

# Open browser with search results
rlama hf-browse mistral --open

Testing a Model

Before creating a RAG, you can test a Hugging Face model directly:

# Try a model in chat mode
rlama run-hf bartowski/Llama-3.2-1B-Instruct-GGUF

# Specify quantization
rlama run-hf mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF --quant Q5_K_M

Creating a RAG with Hugging Face Models

Use Hugging Face models when creating RAG systems:

# Create a RAG with a Hugging Face model
rlama rag hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF my-rag ./docs

# Use specific quantization
rlama rag hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF:Q5_K_M my-rag ./docs

Using OpenAI Models

RLAMA now supports using OpenAI models for inference while keeping Ollama for embeddings:

Set your OpenAI API key:
```
export OPENAI_API_KEY="your-api-key"
```
Create a RAG system with an OpenAI model:
```
rlama rag gpt-4-turbo my-rag ./documents
```
Run your RAG as usual:
```
rlama run my-rag
```

Supported OpenAI models include:

o3-mini
gpt-4o and more...

Note: Only inference uses OpenAI API. Document embeddings still use Ollama for processing.

Managing API Profiles

RLAMA allows you to create API profiles to manage multiple API keys for different providers:

Creating a Profile

# Create a profile for your OpenAI account
rlama profile add openai-work openai "sk-your-api-key"

# Create another profile for a different account
rlama profile add openai-personal openai "sk-your-personal-api-key"

Listing Profiles

# View all available profiles
rlama profile list

Deleting a Profile

# Delete a profile
rlama profile delete openai-old

Using Profiles with RAGs

When creating a new RAG:

# Create a RAG with a specific profile
rlama rag gpt-4 my-rag ./documents --profile openai-work

When updating an existing RAG:

# Update a RAG to use a different model and profile
rlama update-model my-rag gpt-4-turbo --profile openai-personal

Benefits of using profiles:

Manage multiple API keys for different projects
Easily switch between different accounts
Keep API keys secure (stored in ~/.rlama/profiles)
Track which profile was used last and when

For Tasks:

Click tags to check more tools for each tasks

create rag system use rag system list rag systems delete rag system update rlama

For Jobs:

data scientist knowledge engineer technical writer research scientist ai engineer

Alternative AI tools for rlama

Similar Open Source Tools

rlama

github

: 905

LEANN

LEANN is an innovative vector database that democratizes personal AI, transforming your laptop into a powerful RAG system that can index and search through millions of documents using 97% less storage than traditional solutions without accuracy loss. It achieves this through graph-based selective recomputation and high-degree preserving pruning, computing embeddings on-demand instead of storing them all. LEANN allows semantic search of file system, emails, browser history, chat history, codebase, or external knowledge bases on your laptop with zero cloud costs and complete privacy. It is a drop-in semantic search MCP service fully compatible with Claude Code, enabling intelligent retrieval without changing your workflow.

github

: 2.6k

dexto

Dexto is a lightweight runtime for creating and running AI agents that turn natural language into real-world actions. It serves as the missing intelligence layer for building AI applications, standalone chatbots, or as the reasoning engine inside larger products. Dexto features a powerful CLI and Web UI for running AI agents, supports multiple interfaces, allows hot-swapping of LLMs from various providers, connects to remote tool servers via the Model Context Protocol, is config-driven with version-controlled YAML, offers production-ready core features, extensibility for custom services, and enables multi-agent collaboration via MCP and A2A.

github

: 225

orbit

ORBIT (Open Retrieval-Based Inference Toolkit) is a middleware platform that provides a unified API for AI inference. It acts as a central gateway, allowing you to connect various local and remote AI models with your private data sources like SQL databases, vector stores, and local files. ORBIT uses a flexible adapter architecture to connect your data to AI models, creating specialized 'agents' for specific tasks. It supports scenarios like Knowledge Base Q&A and Chat with Your SQL Database, enabling users to interact with AI models seamlessly. The tool offers a RESTful API for programmatic access and includes features like authentication, API key management, system prompts, health monitoring, and file management. ORBIT is designed to streamline AI inference tasks and facilitate interactions between users and AI models.

github

: 144

TalkWithGemini

Talk With Gemini is a web application that allows users to deploy their private Gemini application for free with one click. It supports Gemini Pro and Gemini Pro Vision models. The application features talk mode for direct communication with Gemini, visual recognition for understanding picture content, full Markdown support, automatic compression of chat records, privacy and security with local data storage, well-designed UI with responsive design, fast loading speed, and multi-language support. The tool is designed to be user-friendly and versatile for various deployment options and language preferences.

github

: 616

LLMTSCS

LLMLight is a novel framework that employs Large Language Models (LLMs) as decision-making agents for Traffic Signal Control (TSC). The framework leverages the advanced generalization capabilities of LLMs to engage in a reasoning and decision-making process akin to human intuition for effective traffic control. LLMLight has been demonstrated to be remarkably effective, generalizable, and interpretable against various transportation-based and RL-based baselines on nine real-world and synthetic datasets.

github

: 173

ps-fuzz

The Prompt Fuzzer is an open-source tool that helps you assess the security of your GenAI application's system prompt against various dynamic LLM-based attacks. It provides a security evaluation based on the outcome of these attack simulations, enabling you to strengthen your system prompt as needed. The Prompt Fuzzer dynamically tailors its tests to your application's unique configuration and domain. The Fuzzer also includes a Playground chat interface, giving you the chance to iteratively improve your system prompt, hardening it against a wide spectrum of generative AI attacks.

github

: 367

auto-engineer

Auto Engineer is a tool designed to automate the Software Development Life Cycle (SDLC) by building production-grade applications with a combination of human and AI agents. It offers a plugin-based architecture that allows users to install only the necessary functionality for their projects. The tool guides users through key stages including Flow Modeling, IA Generation, Deterministic Scaffolding, AI Coding & Testing Loop, and Comprehensive Quality Checks. Auto Engineer follows a command/event-driven architecture and provides a modular plugin system for specific functionalities. It supports TypeScript with strict typing throughout and includes a built-in message bus server with a web dashboard for monitoring commands and events.

github

: 61

forge

Forge is a powerful open-source tool for building modern web applications. It provides a simple and intuitive interface for developers to quickly scaffold and deploy projects. With Forge, you can easily create custom components, manage dependencies, and streamline your development workflow. Whether you are a beginner or an experienced developer, Forge offers a flexible and efficient solution for your web development needs.

github

: 4.5k

Shellsage

Shell Sage is an intelligent terminal companion and AI-powered terminal assistant that enhances the terminal experience with features like local and cloud AI support, context-aware error diagnosis, natural language to command translation, and safe command execution workflows. It offers interactive workflows, supports various API providers, and allows for custom model selection. Users can configure the tool for local or API mode, select specific models, and switch between modes easily. Currently in alpha development, Shell Sage has known limitations like limited Windows support and occasional false positives in error detection. The roadmap includes improvements like better context awareness, Windows PowerShell integration, Tmux integration, and CI/CD error pattern database.

github

: 52

zotero-mcp

Zotero MCP seamlessly connects your Zotero research library with AI assistants like ChatGPT and Claude via the Model Context Protocol. It offers AI-powered semantic search, access to library content, PDF annotation extraction, and easy updates. Users can search their library, analyze citations, and get summaries, making it ideal for research tasks. The tool supports multiple embedding models, intelligent search results, and flexible access methods for both local and remote collaboration. With advanced features like semantic search and PDF annotation extraction, Zotero MCP enhances research efficiency and organization.

github

: 513

wikipedia-mcp

The Wikipedia MCP Server is a Model Context Protocol (MCP) server that provides real-time access to Wikipedia information for Large Language Models (LLMs). It allows AI assistants to retrieve accurate and up-to-date information from Wikipedia to enhance their responses. The server offers features such as searching Wikipedia, retrieving article content, getting article summaries, extracting specific sections, discovering links within articles, finding related topics, supporting multiple languages and country codes, optional caching for improved performance, and compatibility with Google ADK agents and other AI frameworks. Users can install the server using pipx, Smithery, PyPI, virtual environment, or from source. The server can be run with various options for transport protocol, language, country/locale, caching, access token, and more. It also supports Docker and Kubernetes deployment. The server provides MCP tools for interacting with Wikipedia, such as searching articles, getting article content, summaries, sections, links, coordinates, related topics, and extracting key facts. It also supports country/locale codes and language variants for languages like Chinese, Serbian, Kurdish, and Norwegian. The server includes example prompts for querying Wikipedia and provides MCP resources for interacting with Wikipedia through MCP endpoints. The project structure includes main packages, API implementation, core functionality, utility functions, and a comprehensive test suite for reliability and functionality testing.

github

: 99

hound

Hound is a security audit automation pipeline for AI-assisted code review that mirrors how expert auditors think, learn, and collaborate. It features graph-driven analysis, sessionized audits, provider-agnostic models, belief system and hypotheses, precise code grounding, and adaptive planning. The system employs a senior/junior auditor pattern where the Scout actively navigates the codebase and annotates knowledge graphs while the Strategist handles high-level planning and vulnerability analysis. Hound is optimized for small-to-medium sized projects like smart contract applications and is language-agnostic.

github

: 325

onefilellm

OneFileLLM is a command-line tool that streamlines the creation of information-dense prompts for large language models (LLMs). It aggregates and preprocesses data from various sources, compiling them into a single text file for quick use. The tool supports automatic source type detection, handling of multiple file formats, web crawling functionality, integration with Sci-Hub for research paper downloads, text preprocessing, token count reporting, and XML encapsulation of output for improved LLM performance. Users can easily access private GitHub repositories by generating a personal access token. The tool's output is encapsulated in XML tags to enhance LLM understanding and processing.

github

: 1.7k

OpenSpec

OpenSpec is a tool for spec-driven development, aligning humans and AI coding assistants to agree on what to build before any code is written. It adds a lightweight specification workflow that ensures deterministic, reviewable outputs without the need for API keys. With OpenSpec, stakeholders can draft change proposals, review and align with AI assistants, implement tasks based on agreed specs, and archive completed changes for merging back into the source-of-truth specs. It works seamlessly with existing AI tools, offering shared visibility into proposed, active, or archived work.

github

: 195

openai-edge-tts

This project provides a local, OpenAI-compatible text-to-speech (TTS) API using `edge-tts`. It emulates the OpenAI TTS endpoint (`/v1/audio/speech`), enabling users to generate speech from text with various voice options and playback speeds, just like the OpenAI API. `edge-tts` uses Microsoft Edge's online text-to-speech service, making it completely free. The project supports multiple audio formats, adjustable playback speed, and voice selection options, providing a flexible and customizable TTS solution for users.

github

: 412

For similar tasks

rlama

github

: 905

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 668

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

rlama

README:

RLAMA - User Guide

Table of Contents

Vision & Roadmap

Completed Features ✅

Small LLM Optimization (Q2 2025)

Advanced Embedding Pipeline (Q2-Q3 2025)

User Experience Enhancements (Q3 2025)

Enterprise Features (Q4 2025)

Next-Gen Retrieval Innovations (Q1 2026)

Installation

Prerequisites

Installation from terminal

Tech Stack

Architecture

Data Flow

Visual Representation

Available Commands

Global Flags

Custom Data Directory

rag - Create a RAG system

crawl-rag - Create a RAG system from a website

Chunking Strategies

wizard - Create a RAG system with interactive setup

watch - Set up directory watching for a RAG system

watch-off - Disable directory watching for a RAG system

check-watched - Check a RAG's watched directory for new files

web-watch - Set up website monitoring for a RAG system

web-watch-off - Disable website monitoring for a RAG system

check-web-watched - Check a RAG's monitored website for updates

run - Use a RAG system

api - Start API server

list - List RAG systems

delete - Delete a RAG system

list-docs - List documents in a RAG

list-chunks - Inspect document chunks

view-chunk - View chunk details

add-docs - Add documents to RAG

crawl-add-docs - Add website content to RAG

update-model - Change LLM model

update - Update RLAMA

version - Display version

hf-browse - Browse GGUF models on Hugging Face

run-hf - Run a Hugging Face GGUF model

Uninstallation

Removing the binary

Removing data

Supported Document Formats

Troubleshooting

Ollama is not accessible

Text extraction issues

The RAG doesn't find relevant information

Other issues

Configuring Ollama Connection

Advanced Usage

Context Size Management

RAG Creation with Filtering

Chunk Inspection

Help System

Hugging Face Integration

Browsing Hugging Face Models

Testing a Model

Creating a RAG with Hugging Face Models

Using OpenAI Models

Managing API Profiles

Creating a Profile

Listing Profiles

Deleting a Profile

Using Profiles with RAGs

For Tasks:

For Jobs:

Alternative AI tools for rlama

Similar Open Source Tools

rlama

LEANN

dexto

orbit

TalkWithGemini

LLMTSCS