
paperless-gpt
Use LLMs and LLM Vision (OCR) to handle paperless-ngx - Document Digitalization powered by AI
Stars: 724

paperless-gpt is a tool designed to generate accurate and meaningful document titles and tags for paperless-ngx using Large Language Models (LLMs). It supports multiple LLM providers, including OpenAI and Ollama. With paperless-gpt, you can streamline your document management by automatically suggesting appropriate titles and tags based on the content of your scanned documents. The tool offers features like multiple LLM support, customizable prompts, easy integration with paperless-ngx, user-friendly interface for reviewing and applying suggestions, dockerized deployment, automatic document processing, and an experimental OCR feature.
README:
paperless-gpt seamlessly pairs with paperless-ngx to generate AI-powered document titles and tags, saving you hours of manual sorting. While other tools may offer AI chat features, paperless-gpt stands out by supercharging OCR with LLMs-ensuring high accuracy, even with tricky scans. If you're craving next-level text extraction and effortless document organization, this is your solution.
https://github.com/user-attachments/assets/bd5d38b9-9309-40b9-93ca-918dfa4f3fd4
❤️ Support This Project
If paperless-gpt is helping you organize your documents and saving you time, please consider sponsoring its development. Your support helps ensure continued improvements and maintenance!
-
LLM-Enhanced OCR
Harness Large Language Models (OpenAI or Ollama) for better-than-traditional OCR—turn messy or low-quality scans into context-aware, high-fidelity text. -
Use specialized AI OCR services
- LLM OCR: Use OpenAI or Ollama to extract text from images.
- Google Document AI: Leverage Google's powerful Document AI for OCR tasks.
- Azure Document Intelligence: Use Microsoft's enterprise OCR solution.
-
Automatic Title, Tag & Created Date Generation
No more guesswork. Let the AI do the naming and categorizing. You can easily review suggestions and refine them if needed. -
Supports DeepSeek reasoning models in Ollama
Greatly enhance accuracy by using a reasoning model likedeepseek-r1:8b
. The perfect tradeoff between privacy and performance! Of course, if you got enough GPUs or NPUs, a bigger model will enhance the experience. -
Automatic Correspondent Generation
Automatically identify and generate correspondents from your documents, making it easier to track and organize your communications. -
Extensive Customization
- Prompt Templates: Tweak your AI prompts to reflect your domain, style, or preference.
- Tagging: Decide how documents get tagged—manually, automatically, or via OCR-based flows.
-
Simple Docker Deployment
A few environment variables, and you're off! Compose it alongside paperless-ngx with minimal fuss. -
Unified Web UI
- Manual Review: Approve or tweak AI's suggestions.
- Auto Processing: Focus only on edge cases while the rest is sorted for you.
- Key Highlights
- Getting Started
- OCR Providers
- Configuration
- OCR using AI
- Usage
- Contributing
- Support the Project
- License
- Star History
- Disclaimer
- Docker installed.
- A running instance of paperless-ngx.
- Access to an LLM provider:
-
OpenAI: An API key with models like
gpt-4o
orgpt-3.5-turbo
. -
Ollama: A running Ollama server with models like
deepseek-r1:8b
.
-
OpenAI: An API key with models like
Here's an example docker-compose.yml
to spin up paperless-gpt alongside paperless-ngx:
services:
paperless-ngx:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
# ... (your existing paperless-ngx config)
paperless-gpt:
image: icereed/paperless-gpt:latest
environment:
PAPERLESS_BASE_URL: "http://paperless-ngx:8000"
PAPERLESS_API_TOKEN: "your_paperless_api_token"
PAPERLESS_PUBLIC_URL: "http://paperless.mydomain.com" # Optional
MANUAL_TAG: "paperless-gpt" # Optional, default: paperless-gpt
AUTO_TAG: "paperless-gpt-auto" # Optional, default: paperless-gpt-auto
LLM_PROVIDER: "openai" # or 'ollama'
LLM_MODEL: "gpt-4o" # or 'deepseek-r1:8b'
# Optional, but recommended for Ollama
TOKEN_LIMIT: 1000
OPENAI_API_KEY: "your_openai_api_key"
# Optional - OPENAI_BASE_URL: 'https://litellm.yourinstallationof.it.com/v1'
LLM_LANGUAGE: "English" # Optional, default: English
# OCR Configuration - Choose one:
# Option 1: LLM-based OCR
OCR_PROVIDER: "llm" # Default OCR provider
VISION_LLM_PROVIDER: "ollama" # openai or ollama
VISION_LLM_MODEL: "minicpm-v" # minicpm-v (ollama) or gpt-4o (openai)
OLLAMA_HOST: "http://host.docker.internal:11434" # If using Ollama
# Option 2: Google Document AI
# OCR_PROVIDER: 'google_docai' # Use Google Document AI
# GOOGLE_PROJECT_ID: 'your-project' # Your GCP project ID
# GOOGLE_LOCATION: 'us' # Document AI region
# GOOGLE_PROCESSOR_ID: 'processor-id' # Your processor ID
# GOOGLE_APPLICATION_CREDENTIALS: '/app/credentials.json' # Path to service account key
# Option 3: Azure Document Intelligence
# OCR_PROVIDER: 'azure' # Use Azure Document Intelligence
# AZURE_DOCAI_ENDPOINT: 'your-endpoint' # Your Azure endpoint URL
# AZURE_DOCAI_KEY: 'your-key' # Your Azure API key
# AZURE_DOCAI_MODEL_ID: 'prebuilt-read' # Optional, defaults to prebuilt-read
# AZURE_DOCAI_TIMEOUT_SECONDS: '120' # Optional, defaults to 120 seconds
AUTO_OCR_TAG: "paperless-gpt-ocr-auto" # Optional, default: paperless-gpt-ocr-auto
OCR_LIMIT_PAGES: "5" # Optional, default: 5. Set to 0 for no limit.
LOG_LEVEL: "info" # Optional: debug, warn, error
volumes:
- ./prompts:/app/prompts # Mount the prompts directory
# For Google Document AI:
- ${HOME}/.config/gcloud/application_default_credentials.json:/app/credentials.json
ports:
- "8080:8080"
depends_on:
- paperless-ngx
Pro Tip: Replace placeholders with real values and read the logs if something looks off.
-
Clone the Repository
git clone https://github.com/icereed/paperless-gpt.git cd paperless-gpt
-
Create a
prompts
Directorymkdir prompts
-
Build the Docker Image
docker build -t paperless-gpt .
-
Run the Container
docker run -d \ -e PAPERLESS_BASE_URL='http://your_paperless_ngx_url' \ -e PAPERLESS_API_TOKEN='your_paperless_api_token' \ -e LLM_PROVIDER='openai' \ -e LLM_MODEL='gpt-4o' \ -e OPENAI_API_KEY='your_openai_api_key' \ -e LLM_LANGUAGE='English' \ -e VISION_LLM_PROVIDER='ollama' \ -e VISION_LLM_MODEL='minicpm-v' \ -e LOG_LEVEL='info' \ -v $(pwd)/prompts:/app/prompts \ -p 8080:8080 \ paperless-gpt
paperless-gpt supports three different OCR providers, each with unique strengths and capabilities:
-
Key Features:
- Uses vision-capable LLMs like gpt-4o or MiniCPM-V
- High accuracy with complex layouts and difficult scans
- Context-aware text recognition
- Self-correcting capabilities for OCR errors
-
Best For:
- Complex or unusual document layouts
- Poor quality scans
- Documents with mixed languages
-
Configuration:
OCR_PROVIDER: "llm" VISION_LLM_PROVIDER: "openai" # or "ollama" VISION_LLM_MODEL: "gpt-4o" # or "minicpm-v"
-
Key Features:
- Enterprise-grade OCR solution
- Prebuilt models for common document types
- Layout preservation and table detection
- Fast processing speeds
-
Best For:
- Business documents and forms
- High-volume processing
- Documents requiring layout analysis
-
Configuration:
OCR_PROVIDER: "azure" AZURE_DOCAI_ENDPOINT: "https://your-endpoint.cognitiveservices.azure.com/" AZURE_DOCAI_KEY: "your-key" AZURE_DOCAI_MODEL_ID: "prebuilt-read" # optional AZURE_DOCAI_TIMEOUT_SECONDS: "120" # optional
-
Key Features:
- Specialized document processors
- Strong form field detection
- Multi-language support
- High accuracy on structured documents
-
Best For:
- Forms and structured documents
- Documents with tables
- Multi-language documents
-
Configuration:
OCR_PROVIDER: "google_docai" GOOGLE_PROJECT_ID: "your-project" GOOGLE_LOCATION: "us" GOOGLE_PROCESSOR_ID: "processor-id"
Note: When using Ollama, ensure that the Ollama server is running and accessible from the paperless-gpt container.
Variable | Description | Required | Default |
---|---|---|---|
PAPERLESS_BASE_URL |
URL of your paperless-ngx instance (e.g. http://paperless-ngx:8000 ). |
Yes | |
PAPERLESS_API_TOKEN |
API token for paperless-ngx. Generate one in paperless-ngx admin. | Yes | |
PAPERLESS_PUBLIC_URL |
Public URL for Paperless (if different from PAPERLESS_BASE_URL ). |
No | |
MANUAL_TAG |
Tag for manual processing. | No | paperless-gpt |
AUTO_TAG |
Tag for auto processing. | No | paperless-gpt-auto |
LLM_PROVIDER |
AI backend (openai or ollama ). |
Yes | |
LLM_MODEL |
AI model name, e.g. gpt-4o , gpt-3.5-turbo , deepseek-r1:8b . |
Yes | |
OPENAI_API_KEY |
OpenAI API key (required if using OpenAI). | Cond. | |
OPENAI_BASE_URL |
OpenAI base URL (optional, if using a custom OpenAI compatible service like LiteLLM). | No | |
LLM_LANGUAGE |
Likely language for documents (e.g. English ). |
No | English |
OLLAMA_HOST |
Ollama server URL (e.g. http://host.docker.internal:11434 ). |
No | |
OCR_PROVIDER |
OCR provider to use (llm , azure , or google_docai ). |
No | llm |
VISION_LLM_PROVIDER |
AI backend for LLM OCR (openai or ollama ). Required if OCR_PROVIDER is llm . |
Cond. | |
VISION_LLM_MODEL |
Model name for LLM OCR (e.g. minicpm-v ). Required if OCR_PROVIDER is llm . |
Cond. | |
AZURE_DOCAI_ENDPOINT |
Azure Document Intelligence endpoint. Required if OCR_PROVIDER is azure . |
Cond. | |
AZURE_DOCAI_KEY |
Azure Document Intelligence API key. Required if OCR_PROVIDER is azure . |
Cond. | |
AZURE_DOCAI_MODEL_ID |
Azure Document Intelligence model ID. Optional if using azure provider. |
No | prebuilt-read |
AZURE_DOCAI_TIMEOUT_SECONDS |
Azure Document Intelligence timeout in seconds. | No | 120 |
GOOGLE_PROJECT_ID |
Google Cloud project ID. Required if OCR_PROVIDER is google_docai . |
Cond. | |
GOOGLE_LOCATION |
Google Cloud region (e.g. us , eu ). Required if OCR_PROVIDER is google_docai . |
Cond. | |
GOOGLE_PROCESSOR_ID |
Document AI processor ID. Required if OCR_PROVIDER is google_docai . |
Cond. | |
GOOGLE_APPLICATION_CREDENTIALS |
Path to the mounted Google service account key. Required if OCR_PROVIDER is google_docai . |
Cond. | |
AUTO_OCR_TAG |
Tag for automatically processing docs with OCR. | No | paperless-gpt-ocr-auto |
LOG_LEVEL |
Application log level (info , debug , warn , error ). |
No | info |
LISTEN_INTERFACE |
Network interface to listen on. | No | 8080 |
AUTO_GENERATE_TITLE |
Generate titles automatically if paperless-gpt-auto is used. |
No | true |
AUTO_GENERATE_TAGS |
Generate tags automatically if paperless-gpt-auto is used. |
No | true |
AUTO_GENERATE_CORRESPONDENTS |
Generate correspondents automatically if paperless-gpt-auto is used. |
No | true |
AUTO_GENERATE_CREATED_DATE |
Generate the created dates automatically if paperless-gpt-auto is used. |
No | true |
OCR_LIMIT_PAGES |
Limit the number of pages for OCR. Set to 0 for no limit. |
No | 5 |
TOKEN_LIMIT |
Maximum tokens allowed for prompts/content. Set to 0 to disable limit. Useful for smaller LLMs. |
No | |
CORRESPONDENT_BLACK_LIST |
A comma-separated list of names to exclude from the correspondents suggestions. Example: John Doe, Jane Smith . |
No |
paperless-gpt's flexible prompt templates let you shape how AI responds:
-
title_prompt.tmpl
: For document titles. -
tag_prompt.tmpl
: For tagging logic. -
ocr_prompt.tmpl
: For LLM OCR. -
correspondent_prompt.tmpl
: For correspondent identification. -
created_date_prompt.tmpl
: For setting of document's created date.
Mount them into your container via:
volumes:
- ./prompts:/app/prompts
Then tweak at will—paperless-gpt reloads them automatically on startup!
Each template has access to specific variables:
title_prompt.tmpl:
-
{{.Language}}
- Target language (e.g., "English") -
{{.Content}}
- Document content text -
{{.Title}}
- Original document title
tag_prompt.tmpl:
-
{{.Language}}
- Target language -
{{.AvailableTags}}
- List of existing tags in paperless-ngx -
{{.OriginalTags}}
- Document's current tags -
{{.Title}}
- Document title -
{{.Content}}
- Document content text
ocr_prompt.tmpl:
-
{{.Language}}
- Target language
correspondent_prompt.tmpl:
-
{{.Language}}
- Target language -
{{.AvailableCorrespondents}}
- List of existing correspondents -
{{.BlackList}}
- List of blacklisted correspondent names -
{{.Title}}
- Document title -
{{.Content}}
- Document content text
created_date_prompt.tmpl:
-
{{.Language}}
- Target language -
{{.Content}}
- Document content text
The templates use Go's text/template syntax. paperless-gpt automatically reloads template changes on startup.
-
Tag Documents
- Add
paperless-gpt
tag to documents for manual processing - Add
paperless-gpt-auto
for automatic processing - Add
paperless-gpt-ocr-auto
for automatic OCR processing
- Add
-
Visit Web UI
- Go to
http://localhost:8080
(or your host) in your browser - Review documents tagged for processing
- Go to
-
Generate & Apply Suggestions
- Click "Generate Suggestions" to see AI-proposed titles/tags/correspondents
- Review and approve or edit suggestions
- Click "Apply" to save changes to paperless-ngx
-
OCR Processing
- Tag documents with appropriate OCR tag to process them
- Monitor progress in the Web UI
- Review results and apply changes
Click to expand the vanilla OCR vs. AI-powered OCR comparison
Image:
Vanilla Paperless-ngx OCR:
La Grande Recre
Gentre Gommercial 1'Esplanade
1349 LOLNAIN LA NEWWE
TA BERBOGAAL Tel =. 010 45,96 12
Ticket 1440112 03/11/2006 a 13597:
4007176614518. DINOS. TYRAMNESA
TOTAET.T.LES
ReslE par Lask-Euron
Rencu en Cash Euro
V.14.6 -Hotgese = VALERTE
TICKET A-GONGERVER PORR TONT. EEHANGE
HERET ET A BIENTOT
LLM-Powered OCR (OpenAI gpt-4o):
La Grande Récré
Centre Commercial l'Esplanade
1348 LOUVAIN LA NEUVE
TVA 860826401 Tel : 010 45 95 12
Ticket 14421 le 03/11/2006 à 15:27:18
4007176614518 DINOS TYRANNOSA 14.90
TOTAL T.T.C. 14.90
Réglé par Cash Euro 50.00
Rendu en Cash Euro 35.10
V.14.6 Hôtesse : VALERIE
TICKET A CONSERVER POUR TOUT ECHANGE
MERCI ET A BIENTOT
Image:
Vanilla Paperless-ngx OCR:
Invoice Number: 1-996-84199
Fed: Invoica Date: Sep01, 2014
Accaunt Number: 1334-8037-4
Page: 1012
Fod£x Tax ID 71.0427007
IRISINC
SHARON ANDERSON
4731 W ATLANTIC AVE STE BI
DELRAY BEACH FL 33445-3897 ’ a
Invoice Questions?
Bing, ‚Account Shipping Address: Contact FedEx Reı
ISINC
4731 W ATLANTIC AVE Phone: (800) 622-1147 M-F 7-6 (CST)
DELRAY BEACH FL 33445-3897 US Fax: (800) 548-3020
Internet: www.fedex.com
Invoice Summary Sep 01, 2014
FodEx Ground Services
Other Charges 11.00
Total Charges 11.00 Da £
>
polo) Fz// /G
TOTAL THIS INVOICE .... usps 11.00 P 2/1 f
‘The only charges accrued for this period is the Weekly Service Charge.
The Fedix Ground aceounts teferencedin his involce have been transteired and assigned 10, are owned by,andare payable to FedEx Express:
To onsurs propor credit, plasa raturn this portion wirh your payment 10 FodEx
‚Please do not staple or fold. Ploase make your chack payablı to FedEx.
[TI For change ol address, hc har and camphat lrm or never ide
Remittance Advice
Your payment is due by Sep 16, 2004
Number Number Dus
1334803719968 41993200000110071
AT 01 0391292 468448196 A**aDGT
IRISINC Illallun elalalssollallansdHilalellund
SHARON ANDERSON
4731 W ATLANTIC AVE STEBI FedEx
DELRAY BEACH FL 334453897 PO. Box 94516
PALATINE IL 60094-4515
LLM-Powered OCR (OpenAI gpt-4o):
FedEx. Invoice Number: 1-996-84199
Invoice Date: Sep 01, 2014
Account Number: 1334-8037-4
Page: 1 of 2
FedEx Tax ID: 71-0427007
I R I S INC
SHARON ANDERSON
4731 W ATLANTIC AVE STE B1
DELRAY BEACH FL 33445-3897
Invoice Questions?
Billing Account Shipping Address: Contact FedEx Revenue Services
I R I S INC Phone: (800) 622-1147 M-F 7-6 (CST)
4731 W ATLANTIC AVE Fax: (800) 548-3020
DELRAY BEACH FL 33445-3897 US Internet: www.fedex.com
Invoice Summary Sep 01, 2014
FedEx Ground Services
Other Charges 11.00
Total Charges .......................................................... USD $ 11.00
TOTAL THIS INVOICE .............................................. USD $ 11.00
The only charges accrued for this period is the Weekly Service Charge.
RECEIVED
SEP _ 8 REC'D
BY: _
posted 9/21/14
The FedEx Ground accounts referenced in this invoice have been transferred and assigned to, are owned by, and are payable to FedEx Express.
To ensure proper credit, please return this portion with your payment to FedEx.
Please do not staple or fold. Please make your check payable to FedEx.
❑ For change of address, check here and complete form on reverse side.
Remittance Advice
Your payment is due by Sep 16, 2004
Invoice
Number
1-996-84199
Account
Number
1334-8037-4
Amount
Due
USD $ 11.00
133480371996841993200000110071
AT 01 031292 468448196 A**3DGT
I R I S INC
SHARON ANDERSON
4731 W ATLANTIC AVE STE B1
DELRAY BEACH FL 33445-3897
FedEx
P.O. Box 94515
Why Does It Matter?
- Traditional OCR often jumbles text from complex or low-quality scans.
- Large Language Models interpret context and correct likely errors, producing results that are more precise and readable.
- You can integrate these cleaned-up texts into your paperless-ngx pipeline for better tagging, searching, and archiving.
- Vanilla OCR typically uses classical methods or Tesseract-like engines to extract text, which can result in garbled outputs for complex fonts or poor-quality scans.
- LLM-Powered OCR uses your chosen AI backend—OpenAI or Ollama—to interpret the image's text in a more context-aware manner. This leads to fewer errors and more coherent text.
When using local LLMs (like those through Ollama), you might need to adjust certain settings to optimize performance:
- Use
TOKEN_LIMIT
environment variable to control the maximum number of tokens sent to the LLM - Smaller models might truncate content unexpectedly if given too much text
- Start with a conservative limit (e.g., 1000 tokens) and adjust based on your model's capabilities
- Set to
0
to disable the limit (use with caution)
Example configuration for smaller models:
environment:
TOKEN_LIMIT: "2000" # Adjust based on your model's context window
LLM_PROVIDER: "ollama"
LLM_MODEL: "deepseek-r1:8b" # Or other local model
Common issues and solutions:
- If you see truncated or incomplete responses, try lowering the
TOKEN_LIMIT
- If processing is too limited, gradually increase the limit while monitoring performance
- For models with larger context windows, you can increase the limit or disable it entirely
Pull requests and issues are welcome!
- Fork the repo
- Create a branch (
feature/my-awesome-update
) - Commit changes (
git commit -m "Improve X"
) - Open a PR
Check out our contributing guidelines for details.
If paperless-gpt is saving you time and making your document management easier, please consider supporting its continued development:
- GitHub Sponsors: Help fund ongoing development and maintenance
- Share your success stories and use cases
- Star the project on GitHub
- Contribute code, documentation, or bug reports
Your support helps ensure paperless-gpt remains actively maintained and continues to improve!
paperless-gpt is licensed under the MIT License. Feel free to adapt and share!
This project is not officially affiliated with paperless-ngx. Use at your own risk.
paperless-gpt: The LLM-based companion your doc management has been waiting for. Enjoy effortless, intelligent document titles, tags, and next-level OCR.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for paperless-gpt
Similar Open Source Tools

paperless-gpt
paperless-gpt is a tool designed to generate accurate and meaningful document titles and tags for paperless-ngx using Large Language Models (LLMs). It supports multiple LLM providers, including OpenAI and Ollama. With paperless-gpt, you can streamline your document management by automatically suggesting appropriate titles and tags based on the content of your scanned documents. The tool offers features like multiple LLM support, customizable prompts, easy integration with paperless-ngx, user-friendly interface for reviewing and applying suggestions, dockerized deployment, automatic document processing, and an experimental OCR feature.

VimLM
VimLM is an AI-powered coding assistant for Vim that integrates AI for code generation, refactoring, and documentation directly into your Vim workflow. It offers native Vim integration with split-window responses and intuitive keybindings, offline first execution with MLX-compatible models, contextual awareness with seamless integration with codebase and external resources, conversational workflow for iterating on responses, project scaffolding for generating and deploying code blocks, and extensibility for creating custom LLM workflows with command chains.

gonzo
Gonzo is a powerful, real-time log analysis terminal UI tool inspired by k9s. It allows users to analyze log streams with beautiful charts, AI-powered insights, and advanced filtering directly from the terminal. The tool provides features like live streaming log processing, OTLP support, interactive dashboard with real-time charts, advanced filtering options including regex support, and AI-powered insights such as pattern detection, anomaly analysis, and root cause suggestions. Users can also configure AI models from providers like OpenAI, LM Studio, and Ollama for intelligent log analysis. Gonzo is built with Bubble Tea, Lipgloss, Cobra, Viper, and OpenTelemetry, following a clean architecture with separate modules for TUI, log analysis, frequency tracking, OTLP handling, and AI integration.

klavis
Klavis AI is a production-ready solution for managing Multiple Communication Protocol (MCP) servers. It offers self-hosted solutions and a hosted service with enterprise OAuth support. With Klavis AI, users can easily deploy and manage over 50 MCP servers for various services like GitHub, Gmail, Google Sheets, YouTube, Slack, and more. The tool provides instant access to MCP servers, seamless authentication, and integration with AI frameworks, making it ideal for individuals and businesses looking to streamline their communication and data management workflows.

gpt-load
GPT-Load is a high-performance, enterprise-grade AI API transparent proxy service designed for enterprises and developers needing to integrate multiple AI services. Built with Go, it features intelligent key management, load balancing, and comprehensive monitoring capabilities for high-concurrency production environments. The tool serves as a transparent proxy service, preserving native API formats of various AI service providers like OpenAI, Google Gemini, and Anthropic Claude. It supports dynamic configuration, distributed leader-follower deployment, and a Vue 3-based web management interface. GPT-Load is production-ready with features like dual authentication, graceful shutdown, and error recovery.

VASA-1-hack
VASA-1-hack is a repository containing the VASA implementation separated from EMOPortraits, with all components properly configured for standalone training. It provides detailed setup instructions, prerequisites, project structure, configuration details, running training modes, troubleshooting tips, monitoring training progress, development information, and acknowledgments. The repository aims to facilitate training volumetric avatar models with configurable parameters and logging levels for efficient debugging and testing.

SwiftAI
SwiftAI is a modern, type-safe Swift library for building AI-powered apps. It provides a unified API that works seamlessly across different AI models, including Apple's on-device models and cloud-based services like OpenAI. With features like model agnosticism, structured output, agent tool loop, conversations, extensibility, and Swift-native design, SwiftAI offers a powerful toolset for developers to integrate AI capabilities into their applications. The library supports easy installation via Swift Package Manager and offers detailed guidance on getting started, structured responses, tool use, model switching, conversations, and advanced constraints. SwiftAI aims to simplify AI integration by providing a type-safe and versatile solution for various AI tasks.

llm-context.py
LLM Context is a tool designed to assist developers in quickly injecting relevant content from code/text projects into Large Language Model chat interfaces. It leverages `.gitignore` patterns for smart file selection and offers a streamlined clipboard workflow using the command line. The tool also provides direct integration with Large Language Models through the Model Context Protocol (MCP). LLM Context is optimized for code repositories and collections of text/markdown/html documents, making it suitable for developers working on projects that fit within an LLM's context window. The tool is under active development and aims to enhance AI-assisted development workflows by harnessing the power of Large Language Models.

google_workspace_mcp
The Google Workspace MCP Server is a production-ready server that integrates major Google Workspace services with AI assistants. It supports single-user and multi-user authentication via OAuth 2.1, making it a powerful backend for custom applications. Built with FastMCP for optimal performance, it features advanced authentication handling, service caching, and streamlined development patterns. The server provides full natural language control over Google Calendar, Drive, Gmail, Docs, Sheets, Slides, Forms, Tasks, and Chat through all MCP clients, AI assistants, and developer tools. It supports free Google accounts and Google Workspace plans with expanded app options like Chat & Spaces. The server also offers private cloud instance options.

ruler
Ruler is a tool designed to centralize AI coding assistant instructions, providing a single source of truth for managing instructions across multiple AI coding tools. It helps in avoiding inconsistent guidance, duplicated effort, context drift, onboarding friction, and complex project structures by automatically distributing instructions to the right configuration files. With support for nested rule loading, Ruler can handle complex project structures with context-specific instructions for different components. It offers features like centralised rule management, nested rule loading, automatic distribution, targeted agent configuration, MCP server propagation, .gitignore automation, and a simple CLI for easy configuration management.

oxylabs-mcp
The Oxylabs MCP Server acts as a bridge between AI models and the web, providing clean, structured data from any site. It enables scraping of URLs, rendering JavaScript-heavy pages, content extraction for AI use, bypassing anti-scraping measures, and accessing geo-restricted web data from 195+ countries. The implementation utilizes the Model Context Protocol (MCP) to facilitate secure interactions between AI assistants and web content. Key features include scraping content from any site, automatic data cleaning and conversion, bypassing blocks and geo-restrictions, flexible setup with cross-platform support, and built-in error handling and request management.

quantalogic
QuantaLogic is a ReAct framework for building advanced AI agents that seamlessly integrates large language models with a robust tool system. It aims to bridge the gap between advanced AI models and practical implementation in business processes by enabling agents to understand, reason about, and execute complex tasks through natural language interaction. The framework includes features such as ReAct Framework, Universal LLM Support, Secure Tool System, Real-time Monitoring, Memory Management, and Enterprise Ready components.

aicommit2
AICommit2 is a Reactive CLI tool that streamlines interactions with various AI providers such as OpenAI, Anthropic Claude, Gemini, Mistral AI, Cohere, and unofficial providers like Huggingface and Clova X. Users can request multiple AI simultaneously to generate git commit messages without waiting for all AI responses. The tool runs 'git diff' to grab code changes, sends them to configured AI, and returns the AI-generated commit message. Users can set API keys or Cookies for different providers and configure options like locale, generate number of messages, commit type, proxy, timeout, max-length, and more. AICommit2 can be used both locally with Ollama and remotely with supported providers, offering flexibility and efficiency in generating commit messages.

auto-engineer
Auto Engineer is a tool designed to automate the Software Development Life Cycle (SDLC) by building production-grade applications with a combination of human and AI agents. It offers a plugin-based architecture that allows users to install only the necessary functionality for their projects. The tool guides users through key stages including Flow Modeling, IA Generation, Deterministic Scaffolding, AI Coding & Testing Loop, and Comprehensive Quality Checks. Auto Engineer follows a command/event-driven architecture and provides a modular plugin system for specific functionalities. It supports TypeScript with strict typing throughout and includes a built-in message bus server with a web dashboard for monitoring commands and events.

monoscope
Monoscope is an open-source monitoring and observability platform that uses artificial intelligence to understand and monitor systems automatically. It allows users to ingest and explore logs, traces, and metrics in S3 buckets, query in natural language via LLMs, and create AI agents to detect anomalies. Key capabilities include universal data ingestion, AI-powered understanding, natural language interface, cost-effective storage, and zero configuration. Monoscope is designed to reduce alert fatigue, catch issues before they impact users, and provide visibility across complex systems.

mistral.rs
Mistral.rs is a fast LLM inference platform written in Rust. We support inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.
For similar tasks

paperless-gpt
paperless-gpt is a tool designed to generate accurate and meaningful document titles and tags for paperless-ngx using Large Language Models (LLMs). It supports multiple LLM providers, including OpenAI and Ollama. With paperless-gpt, you can streamline your document management by automatically suggesting appropriate titles and tags based on the content of your scanned documents. The tool offers features like multiple LLM support, customizable prompts, easy integration with paperless-ngx, user-friendly interface for reviewing and applying suggestions, dockerized deployment, automatic document processing, and an experimental OCR feature.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

AI-in-a-Box
AI-in-a-Box is a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction, while maintaining the highest standards of quality and efficiency. It provides essential guidance on the responsible use of AI and LLM technologies, specific security guidance for Generative AI (GenAI) applications, and best practices for scaling OpenAI applications within Azure. The available accelerators include: Azure ML Operationalization in-a-box, Edge AI in-a-box, Doc Intelligence in-a-box, Image and Video Analysis in-a-box, Cognitive Services Landing Zone in-a-box, Semantic Kernel Bot in-a-box, NLP to SQL in-a-box, Assistants API in-a-box, and Assistants API Bot in-a-box.

langchain-rust
LangChain Rust is a library for building applications with Large Language Models (LLMs) through composability. It provides a set of tools and components that can be used to create conversational agents, document loaders, and other applications that leverage LLMs. LangChain Rust supports a variety of LLMs, including OpenAI, Azure OpenAI, Ollama, and Anthropic Claude. It also supports a variety of embeddings, vector stores, and document loaders. LangChain Rust is designed to be easy to use and extensible, making it a great choice for developers who want to build applications with LLMs.

dolma
Dolma is a dataset and toolkit for curating large datasets for (pre)-training ML models. The dataset consists of 3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials. The toolkit provides high-performance, portable, and extensible tools for processing, tagging, and deduplicating documents. Key features of the toolkit include built-in taggers, fast deduplication, and cloud support.

sparrow
Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. It seamlessly handles forms, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services and pipelines all optimized for robust performance. One of the critical functionalities of Sparrow - pluggable architecture. You can easily integrate and run data extraction pipelines using tools and frameworks like LlamaIndex, Haystack, or Unstructured. Sparrow enables local LLM data extraction pipelines through Ollama or Apple MLX. With Sparrow solution you get API, which helps to process and transform your data into structured output, ready to be integrated with custom workflows. Sparrow Agents - with Sparrow you can build independent LLM agents, and use API to invoke them from your system. **List of available agents:** * **llamaindex** - RAG pipeline with LlamaIndex for PDF processing * **vllamaindex** - RAG pipeline with LLamaIndex multimodal for image processing * **vprocessor** - RAG pipeline with OCR and LlamaIndex for image processing * **haystack** - RAG pipeline with Haystack for PDF processing * **fcall** - Function call pipeline * **unstructured-light** - RAG pipeline with Unstructured and LangChain, supports PDF and image processing * **unstructured** - RAG pipeline with Weaviate vector DB query, Unstructured and LangChain, supports PDF and image processing * **instructor** - RAG pipeline with Unstructured and Instructor libraries, supports PDF and image processing. Works great for JSON response generation

Open-DocLLM
Open-DocLLM is an open-source project that addresses data extraction and processing challenges using OCR and LLM technologies. It consists of two main layers: OCR for reading document content and LLM for extracting specific content in a structured manner. The project offers a larger context window size compared to JP Morgan's DocLLM and integrates tools like Tesseract OCR and Mistral for efficient data analysis. Users can run the models on-premises using LLM studio or Ollama, and the project includes a FastAPI app for testing purposes.

aws-genai-llm-chatbot
This repository provides code to deploy a chatbot powered by Multi-Model and Multi-RAG using AWS CDK on AWS. Users can experiment with various Large Language Models and Multimodal Language Models from different providers. The solution supports Amazon Bedrock, Amazon SageMaker self-hosted models, and third-party providers via API. It also offers additional resources like AWS Generative AI CDK Constructs and Project Lakechain for building generative AI solutions and document processing. The roadmap and authors are listed, along with contributors. The library is licensed under the MIT-0 License with information on changelog, code of conduct, and contributing guidelines. A legal disclaimer advises users to conduct their own assessment before using the content for production purposes.
For similar jobs

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.

exif-photo-blog
EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.

SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.

Twitter-Insight-LLM
This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).

AISuperDomain
Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.

ChatGPT-On-CS
This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.

obs-localvocal
LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.