paperless-gpt

Use LLMs and LLM Vision (OCR) to handle paperless-ngx - Document Digitalization powered by AI

Stars: 724

Visit

paperless-gpt is a tool designed to generate accurate and meaningful document titles and tags for paperless-ngx using Large Language Models (LLMs). It supports multiple LLM providers, including OpenAI and Ollama. With paperless-gpt, you can streamline your document management by automatically suggesting appropriate titles and tags based on the content of your scanned documents. The tool offers features like multiple LLM support, customizable prompts, easy integration with paperless-ngx, user-friendly interface for reviewing and applying suggestions, dockerized deployment, automatic document processing, and an experimental OCR feature.

README:

paperless-gpt

paperless-gpt seamlessly pairs with paperless-ngx to generate AI-powered document titles and tags, saving you hours of manual sorting. While other tools may offer AI chat features, paperless-gpt stands out by supercharging OCR with LLMs-ensuring high accuracy, even with tricky scans. If you're craving next-level text extraction and effortless document organization, this is your solution.

https://github.com/user-attachments/assets/bd5d38b9-9309-40b9-93ca-918dfa4f3fd4

❤️ Support This Project
If paperless-gpt is helping you organize your documents and saving you time, please consider sponsoring its development. Your support helps ensure continued improvements and maintenance!

Key Highlights

LLM-Enhanced OCR
Harness Large Language Models (OpenAI or Ollama) for better-than-traditional OCR—turn messy or low-quality scans into context-aware, high-fidelity text.
Use specialized AI OCR services
- LLM OCR: Use OpenAI or Ollama to extract text from images.
- Google Document AI: Leverage Google's powerful Document AI for OCR tasks.
- Azure Document Intelligence: Use Microsoft's enterprise OCR solution.
Automatic Title, Tag & Created Date Generation
No more guesswork. Let the AI do the naming and categorizing. You can easily review suggestions and refine them if needed.
Supports DeepSeek reasoning models in Ollama
Greatly enhance accuracy by using a reasoning model like deepseek-r1:8b. The perfect tradeoff between privacy and performance! Of course, if you got enough GPUs or NPUs, a bigger model will enhance the experience.
Automatic Correspondent Generation
Automatically identify and generate correspondents from your documents, making it easier to track and organize your communications.
Extensive Customization
- Prompt Templates: Tweak your AI prompts to reflect your domain, style, or preference.
- Tagging: Decide how documents get tagged—manually, automatically, or via OCR-based flows.
Simple Docker Deployment
A few environment variables, and you're off! Compose it alongside paperless-ngx with minimal fuss.
Unified Web UI
- Manual Review: Approve or tweak AI's suggestions.
- Auto Processing: Focus only on edge cases while the rest is sorted for you.

Key Highlights
Getting Started
- Prerequisites
- Installation
  - Docker Compose
  - Manual Setup
OCR Providers
Configuration
- Environment Variables
- Custom Prompt Templates
OCR using AI
Usage
Contributing
Support the Project
License
Star History
Disclaimer

Getting Started

Prerequisites

Docker installed.
A running instance of paperless-ngx.
Access to an LLM provider:
- OpenAI: An API key with models like gpt-4o or gpt-3.5-turbo.
- Ollama: A running Ollama server with models like deepseek-r1:8b.

Installation

Docker Compose

Here's an example docker-compose.yml to spin up paperless-gpt alongside paperless-ngx:

services:
  paperless-ngx:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    # ... (your existing paperless-ngx config)

  paperless-gpt:
    image: icereed/paperless-gpt:latest
    environment:
      PAPERLESS_BASE_URL: "http://paperless-ngx:8000"
      PAPERLESS_API_TOKEN: "your_paperless_api_token"
      PAPERLESS_PUBLIC_URL: "http://paperless.mydomain.com" # Optional
      MANUAL_TAG: "paperless-gpt" # Optional, default: paperless-gpt
      AUTO_TAG: "paperless-gpt-auto" # Optional, default: paperless-gpt-auto
      LLM_PROVIDER: "openai" # or 'ollama'
      LLM_MODEL: "gpt-4o" # or 'deepseek-r1:8b'
      # Optional, but recommended for Ollama
      TOKEN_LIMIT: 1000
      OPENAI_API_KEY: "your_openai_api_key"
      # Optional - OPENAI_BASE_URL: 'https://litellm.yourinstallationof.it.com/v1'
      LLM_LANGUAGE: "English" # Optional, default: English

      # OCR Configuration - Choose one:
      # Option 1: LLM-based OCR
      OCR_PROVIDER: "llm" # Default OCR provider
      VISION_LLM_PROVIDER: "ollama" # openai or ollama
      VISION_LLM_MODEL: "minicpm-v" # minicpm-v (ollama) or gpt-4o (openai)
      OLLAMA_HOST: "http://host.docker.internal:11434" # If using Ollama

      # Option 2: Google Document AI
      # OCR_PROVIDER: 'google_docai'       # Use Google Document AI
      # GOOGLE_PROJECT_ID: 'your-project'  # Your GCP project ID
      # GOOGLE_LOCATION: 'us'              # Document AI region
      # GOOGLE_PROCESSOR_ID: 'processor-id' # Your processor ID
      # GOOGLE_APPLICATION_CREDENTIALS: '/app/credentials.json' # Path to service account key

      # Option 3: Azure Document Intelligence
      # OCR_PROVIDER: 'azure'              # Use Azure Document Intelligence
      # AZURE_DOCAI_ENDPOINT: 'your-endpoint' # Your Azure endpoint URL
      # AZURE_DOCAI_KEY: 'your-key'        # Your Azure API key
      # AZURE_DOCAI_MODEL_ID: 'prebuilt-read' # Optional, defaults to prebuilt-read
      # AZURE_DOCAI_TIMEOUT_SECONDS: '120'  # Optional, defaults to 120 seconds

      AUTO_OCR_TAG: "paperless-gpt-ocr-auto" # Optional, default: paperless-gpt-ocr-auto
      OCR_LIMIT_PAGES: "5" # Optional, default: 5. Set to 0 for no limit.
      LOG_LEVEL: "info" # Optional: debug, warn, error
    volumes:
      - ./prompts:/app/prompts # Mount the prompts directory
      # For Google Document AI:
      - ${HOME}/.config/gcloud/application_default_credentials.json:/app/credentials.json
    ports:
      - "8080:8080"
    depends_on:
      - paperless-ngx

Pro Tip: Replace placeholders with real values and read the logs if something looks off.

Manual Setup

Clone the Repository

git clone https://github.com/icereed/paperless-gpt.git
cd paperless-gpt

Create a prompts Directory
```
mkdir prompts
```
Build the Docker Image
```
docker build -t paperless-gpt .
```

Run the Container

docker run -d \
  -e PAPERLESS_BASE_URL='http://your_paperless_ngx_url' \
  -e PAPERLESS_API_TOKEN='your_paperless_api_token' \
  -e LLM_PROVIDER='openai' \
  -e LLM_MODEL='gpt-4o' \
  -e OPENAI_API_KEY='your_openai_api_key' \
  -e LLM_LANGUAGE='English' \
  -e VISION_LLM_PROVIDER='ollama' \
  -e VISION_LLM_MODEL='minicpm-v' \
  -e LOG_LEVEL='info' \
  -v $(pwd)/prompts:/app/prompts \
  -p 8080:8080 \
  paperless-gpt

OCR Providers

paperless-gpt supports three different OCR providers, each with unique strengths and capabilities:

1. LLM-based OCR (Default)

Key Features:
- Uses vision-capable LLMs like gpt-4o or MiniCPM-V
- High accuracy with complex layouts and difficult scans
- Context-aware text recognition
- Self-correcting capabilities for OCR errors
Best For:
- Complex or unusual document layouts
- Poor quality scans
- Documents with mixed languages

Configuration:

OCR_PROVIDER: "llm"
VISION_LLM_PROVIDER: "openai" # or "ollama"
VISION_LLM_MODEL: "gpt-4o" # or "minicpm-v"

2. Azure Document Intelligence

Key Features:
- Enterprise-grade OCR solution
- Prebuilt models for common document types
- Layout preservation and table detection
- Fast processing speeds
Best For:
- Business documents and forms
- High-volume processing
- Documents requiring layout analysis

Configuration:

OCR_PROVIDER: "azure"
AZURE_DOCAI_ENDPOINT: "https://your-endpoint.cognitiveservices.azure.com/"
AZURE_DOCAI_KEY: "your-key"
AZURE_DOCAI_MODEL_ID: "prebuilt-read" # optional
AZURE_DOCAI_TIMEOUT_SECONDS: "120" # optional

3. Google Document AI

Key Features:
- Specialized document processors
- Strong form field detection
- Multi-language support
- High accuracy on structured documents
Best For:
- Forms and structured documents
- Documents with tables
- Multi-language documents

Configuration:

OCR_PROVIDER: "google_docai"
GOOGLE_PROJECT_ID: "your-project"
GOOGLE_LOCATION: "us"
GOOGLE_PROCESSOR_ID: "processor-id"

Configuration

Environment Variables

Note: When using Ollama, ensure that the Ollama server is running and accessible from the paperless-gpt container.

Variable	Description	Required	Default
`PAPERLESS_BASE_URL`	URL of your paperless-ngx instance (e.g. `http://paperless-ngx:8000`).	Yes
`PAPERLESS_API_TOKEN`	API token for paperless-ngx. Generate one in paperless-ngx admin.	Yes
`PAPERLESS_PUBLIC_URL`	Public URL for Paperless (if different from `PAPERLESS_BASE_URL`).	No
`MANUAL_TAG`	Tag for manual processing.	No	paperless-gpt
`AUTO_TAG`	Tag for auto processing.	No	paperless-gpt-auto
`LLM_PROVIDER`	AI backend (`openai` or `ollama`).	Yes
`LLM_MODEL`	AI model name, e.g. `gpt-4o`, `gpt-3.5-turbo`, `deepseek-r1:8b`.	Yes
`OPENAI_API_KEY`	OpenAI API key (required if using OpenAI).	Cond.
`OPENAI_BASE_URL`	OpenAI base URL (optional, if using a custom OpenAI compatible service like LiteLLM).	No
`LLM_LANGUAGE`	Likely language for documents (e.g. `English`).	No	English
`OLLAMA_HOST`	Ollama server URL (e.g. `http://host.docker.internal:11434`).	No
`OCR_PROVIDER`	OCR provider to use (`llm`, `azure`, or `google_docai`).	No	llm
`VISION_LLM_PROVIDER`	AI backend for LLM OCR (`openai` or `ollama`). Required if OCR_PROVIDER is `llm`.	Cond.
`VISION_LLM_MODEL`	Model name for LLM OCR (e.g. `minicpm-v`). Required if OCR_PROVIDER is `llm`.	Cond.
`AZURE_DOCAI_ENDPOINT`	Azure Document Intelligence endpoint. Required if OCR_PROVIDER is `azure`.	Cond.
`AZURE_DOCAI_KEY`	Azure Document Intelligence API key. Required if OCR_PROVIDER is `azure`.	Cond.
`AZURE_DOCAI_MODEL_ID`	Azure Document Intelligence model ID. Optional if using `azure` provider.	No	prebuilt-read
`AZURE_DOCAI_TIMEOUT_SECONDS`	Azure Document Intelligence timeout in seconds.	No	120
`GOOGLE_PROJECT_ID`	Google Cloud project ID. Required if OCR_PROVIDER is `google_docai`.	Cond.
`GOOGLE_LOCATION`	Google Cloud region (e.g. `us`, `eu`). Required if OCR_PROVIDER is `google_docai`.	Cond.
`GOOGLE_PROCESSOR_ID`	Document AI processor ID. Required if OCR_PROVIDER is `google_docai`.	Cond.
`GOOGLE_APPLICATION_CREDENTIALS`	Path to the mounted Google service account key. Required if OCR_PROVIDER is `google_docai`.	Cond.
`AUTO_OCR_TAG`	Tag for automatically processing docs with OCR.	No	paperless-gpt-ocr-auto
`LOG_LEVEL`	Application log level (`info`, `debug`, `warn`, `error`).	No	info
`LISTEN_INTERFACE`	Network interface to listen on.	No	8080
`AUTO_GENERATE_TITLE`	Generate titles automatically if `paperless-gpt-auto` is used.	No	true
`AUTO_GENERATE_TAGS`	Generate tags automatically if `paperless-gpt-auto` is used.	No	true
`AUTO_GENERATE_CORRESPONDENTS`	Generate correspondents automatically if `paperless-gpt-auto` is used.	No	true
`AUTO_GENERATE_CREATED_DATE`	Generate the created dates automatically if `paperless-gpt-auto` is used.	No	true
`OCR_LIMIT_PAGES`	Limit the number of pages for OCR. Set to `0` for no limit.	No	5
`TOKEN_LIMIT`	Maximum tokens allowed for prompts/content. Set to `0` to disable limit. Useful for smaller LLMs.	No
`CORRESPONDENT_BLACK_LIST`	A comma-separated list of names to exclude from the correspondents suggestions. Example: `John Doe, Jane Smith`.	No

Custom Prompt Templates

paperless-gpt's flexible prompt templates let you shape how AI responds:

title_prompt.tmpl: For document titles.
tag_prompt.tmpl: For tagging logic.
ocr_prompt.tmpl: For LLM OCR.
correspondent_prompt.tmpl: For correspondent identification.
created_date_prompt.tmpl: For setting of document's created date.

Mount them into your container via:

volumes:
  - ./prompts:/app/prompts

Then tweak at will—paperless-gpt reloads them automatically on startup!

Template Variables

Each template has access to specific variables:

title_prompt.tmpl:

{{.Language}} - Target language (e.g., "English")
{{.Content}} - Document content text
{{.Title}} - Original document title

tag_prompt.tmpl:

{{.Language}} - Target language
{{.AvailableTags}} - List of existing tags in paperless-ngx
{{.OriginalTags}} - Document's current tags
{{.Title}} - Document title
{{.Content}} - Document content text

ocr_prompt.tmpl:

{{.Language}} - Target language

correspondent_prompt.tmpl:

{{.Language}} - Target language
{{.AvailableCorrespondents}} - List of existing correspondents
{{.BlackList}} - List of blacklisted correspondent names
{{.Title}} - Document title
{{.Content}} - Document content text

created_date_prompt.tmpl:

{{.Language}} - Target language
{{.Content}} - Document content text

The templates use Go's text/template syntax. paperless-gpt automatically reloads template changes on startup.

Usage

Tag Documents
- Add paperless-gpt tag to documents for manual processing
- Add paperless-gpt-auto for automatic processing
- Add paperless-gpt-ocr-auto for automatic OCR processing
Visit Web UI
- Go to http://localhost:8080 (or your host) in your browser
- Review documents tagged for processing
Generate & Apply Suggestions
- Click "Generate Suggestions" to see AI-proposed titles/tags/correspondents
- Review and approve or edit suggestions
- Click "Apply" to save changes to paperless-ngx
OCR Processing
- Tag documents with appropriate OCR tag to process them
- Monitor progress in the Web UI
- Review results and apply changes

LLM-Based OCR: Compare for Yourself

Click to expand the vanilla OCR vs. AI-powered OCR comparison

Example 1

Image:

Vanilla Paperless-ngx OCR:

La Grande Recre

Gentre Gommercial 1'Esplanade
1349 LOLNAIN LA NEWWE
TA BERBOGAAL Tel =. 010 45,96 12
Ticket 1440112 03/11/2006 a 13597:
4007176614518. DINOS. TYRAMNESA
TOTAET.T.LES
ReslE par Lask-Euron
Rencu en Cash Euro
V.14.6 -Hotgese = VALERTE
TICKET A-GONGERVER PORR TONT. EEHANGE
HERET ET A BIENTOT

LLM-Powered OCR (OpenAI gpt-4o):

La Grande Récré
Centre Commercial l'Esplanade
1348 LOUVAIN LA NEUVE
TVA 860826401 Tel : 010 45 95 12
Ticket 14421 le 03/11/2006 à 15:27:18
4007176614518 DINOS TYRANNOSA 14.90
TOTAL T.T.C. 14.90
Réglé par Cash Euro 50.00
Rendu en Cash Euro 35.10
V.14.6 Hôtesse : VALERIE
TICKET A CONSERVER POUR TOUT ECHANGE
MERCI ET A BIENTOT

Example 2

Image:

Vanilla Paperless-ngx OCR:

Invoice Number: 1-996-84199

Fed: Invoica Date: Sep01, 2014
Accaunt Number: 1334-8037-4
Page: 1012

Fod£x Tax ID 71.0427007

IRISINC
SHARON ANDERSON
4731 W ATLANTIC AVE STE BI
DELRAY BEACH FL 33445-3897 ’ a
Invoice Questions?

Bing, ‚Account Shipping Address: Contact FedEx Reı

ISINC
4731 W ATLANTIC AVE Phone: (800) 622-1147 M-F 7-6 (CST)
DELRAY BEACH FL 33445-3897 US Fax: (800) 548-3020

Internet: www.fedex.com

Invoice Summary Sep 01, 2014

FodEx Ground Services
Other Charges 11.00
Total Charges 11.00 Da £
>
polo) Fz// /G
TOTAL THIS INVOICE .... usps 11.00 P 2/1 f

‘The only charges accrued for this period is the Weekly Service Charge.

The Fedix Ground aceounts teferencedin his involce have been transteired and assigned 10, are owned by,andare payable to FedEx Express:

To onsurs propor credit, plasa raturn this portion wirh your payment 10 FodEx
‚Please do not staple or fold. Ploase make your chack payablı to FedEx.

[TI For change ol address, hc har and camphat lrm or never ide

Remittance Advice
Your payment is due by Sep 16, 2004

Number Number Dus

1334803719968 41993200000110071

AT 01 0391292 468448196 A**aDGT

IRISINC Illallun elalalssollallansdHilalellund
SHARON ANDERSON

4731 W ATLANTIC AVE STEBI FedEx

DELRAY BEACH FL 334453897 PO. Box 94516

PALATINE IL 60094-4515

LLM-Powered OCR (OpenAI gpt-4o):

FedEx.                                                                                      Invoice Number: 1-996-84199
                                                                                           Invoice Date: Sep 01, 2014
                                                                                           Account Number: 1334-8037-4
                                                                                           Page: 1 of 2
                                                                                           FedEx Tax ID: 71-0427007

I R I S INC
SHARON ANDERSON
4731 W ATLANTIC AVE STE B1
DELRAY BEACH FL 33445-3897
                                                                                           Invoice Questions?
Billing Account Shipping Address:                                                          Contact FedEx Revenue Services
I R I S INC                                                                                Phone: (800) 622-1147 M-F 7-6 (CST)
4731 W ATLANTIC AVE                                                                        Fax: (800) 548-3020
DELRAY BEACH FL 33445-3897 US                                                              Internet: www.fedex.com

Invoice Summary Sep 01, 2014

FedEx Ground Services
Other Charges                                                                 11.00

Total Charges .......................................................... USD $          11.00

TOTAL THIS INVOICE .............................................. USD $                 11.00

The only charges accrued for this period is the Weekly Service Charge.

                                                                                           RECEIVED
                                                                                           SEP _ 8 REC'D
                                                                                           BY: _

                                                                                           posted 9/21/14

The FedEx Ground accounts referenced in this invoice have been transferred and assigned to, are owned by, and are payable to FedEx Express.

To ensure proper credit, please return this portion with your payment to FedEx.
Please do not staple or fold. Please make your check payable to FedEx.

❑ For change of address, check here and complete form on reverse side.

Remittance Advice
Your payment is due by Sep 16, 2004

Invoice
Number
1-996-84199

Account
Number
1334-8037-4

Amount
Due
USD $ 11.00

133480371996841993200000110071

AT 01 031292 468448196 A**3DGT

I R I S INC
SHARON ANDERSON
4731 W ATLANTIC AVE STE B1
DELRAY BEACH FL 33445-3897

FedEx
P.O. Box 94515

Why Does It Matter?

Traditional OCR often jumbles text from complex or low-quality scans.
Large Language Models interpret context and correct likely errors, producing results that are more precise and readable.
You can integrate these cleaned-up texts into your paperless-ngx pipeline for better tagging, searching, and archiving.

How It Works

Vanilla OCR typically uses classical methods or Tesseract-like engines to extract text, which can result in garbled outputs for complex fonts or poor-quality scans.
LLM-Powered OCR uses your chosen AI backend—OpenAI or Ollama—to interpret the image's text in a more context-aware manner. This leads to fewer errors and more coherent text.

Troubleshooting

Working with Local LLMs

When using local LLMs (like those through Ollama), you might need to adjust certain settings to optimize performance:

Token Management

Use TOKEN_LIMIT environment variable to control the maximum number of tokens sent to the LLM
Smaller models might truncate content unexpectedly if given too much text
Start with a conservative limit (e.g., 1000 tokens) and adjust based on your model's capabilities
Set to 0 to disable the limit (use with caution)

Example configuration for smaller models:

environment:
  TOKEN_LIMIT: "2000" # Adjust based on your model's context window
  LLM_PROVIDER: "ollama"
  LLM_MODEL: "deepseek-r1:8b" # Or other local model

Common issues and solutions:

If you see truncated or incomplete responses, try lowering the TOKEN_LIMIT
If processing is too limited, gradually increase the limit while monitoring performance
For models with larger context windows, you can increase the limit or disable it entirely

Contributing

Pull requests and issues are welcome!

Fork the repo
Create a branch (feature/my-awesome-update)
Commit changes (git commit -m "Improve X")
Open a PR

Check out our contributing guidelines for details.

Support the Project

If paperless-gpt is saving you time and making your document management easier, please consider supporting its continued development:

GitHub Sponsors: Help fund ongoing development and maintenance
Share your success stories and use cases
Star the project on GitHub
Contribute code, documentation, or bug reports

Your support helps ensure paperless-gpt remains actively maintained and continues to improve!

License

paperless-gpt is licensed under the MIT License. Feel free to adapt and share!

Star History

Disclaimer

This project is not officially affiliated with paperless-ngx. Use at your own risk.

paperless-gpt: The LLM-based companion your doc management has been waiting for. Enjoy effortless, intelligent document titles, tags, and next-level OCR.

For Tasks:

Click tags to check more tools for each tasks

generate titles suggest tags review suggestions apply titles process documents

For Jobs:

document manager content creator data analyst research assistant digital archivist

Alternative AI tools for paperless-gpt

Similar Open Source Tools

paperless-gpt

github

: 724

VimLM

VimLM is an AI-powered coding assistant for Vim that integrates AI for code generation, refactoring, and documentation directly into your Vim workflow. It offers native Vim integration with split-window responses and intuitive keybindings, offline first execution with MLX-compatible models, contextual awareness with seamless integration with codebase and external resources, conversational workflow for iterating on responses, project scaffolding for generating and deploying code blocks, and extensibility for creating custom LLM workflows with command chains.

github

: 193

evalchemy

Evalchemy is a unified and easy-to-use toolkit for evaluating language models, focusing on post-trained models. It integrates multiple existing benchmarks such as RepoBench, AlpacaEval, and ZeroEval. Key features include unified installation, parallel evaluation, simplified usage, and results management. Users can run various benchmarks with a consistent command-line interface and track results locally or integrate with a database for systematic tracking and leaderboard submission.

github

: 317

quantalogic

QuantaLogic is a ReAct framework for building advanced AI agents that seamlessly integrates large language models with a robust tool system. It aims to bridge the gap between advanced AI models and practical implementation in business processes by enabling agents to understand, reason about, and execute complex tasks through natural language interaction. The framework includes features such as ReAct Framework, Universal LLM Support, Secure Tool System, Real-time Monitoring, Memory Management, and Enterprise Ready components.

github

: 376

aicommit2

AICommit2 is a Reactive CLI tool that streamlines interactions with various AI providers such as OpenAI, Anthropic Claude, Gemini, Mistral AI, Cohere, and unofficial providers like Huggingface and Clova X. Users can request multiple AI simultaneously to generate git commit messages without waiting for all AI responses. The tool runs 'git diff' to grab code changes, sends them to configured AI, and returns the AI-generated commit message. Users can set API keys or Cookies for different providers and configure options like locale, generate number of messages, commit type, proxy, timeout, max-length, and more. AICommit2 can be used both locally with Ollama and remotely with supported providers, offering flexibility and efficiency in generating commit messages.

github

: 242

local-deep-research

Local Deep Research is a powerful AI-powered research assistant that performs deep, iterative analysis using multiple LLMs and web searches. It can be run locally for privacy or configured to use cloud-based LLMs for enhanced capabilities. The tool offers advanced research capabilities, flexible LLM support, rich output options, privacy-focused operation, enhanced search integration, and academic & scientific integration. It also provides a web interface, command line interface, and supports multiple LLM providers and search engines. Users can configure AI models, search engines, and research parameters for customized research experiences.

github

: 2.0k

paelladoc

github

: 221

mistral.rs

Mistral.rs is a fast LLM inference platform written in Rust. We support inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.

github

: 5.4k

caddy-defender

The Caddy Defender plugin is a middleware for Caddy that allows you to block or manipulate requests based on the client's IP address. It provides features such as IP range filtering, predefined IP ranges for popular AI services, custom IP ranges configuration, and multiple responder backends for different actions like blocking, custom responses, dropping connections, returning garbage data, redirecting, and tarpitting to stall bots. The plugin can be easily installed using Docker or built with `xcaddy`. Configuration is done through the Caddyfile syntax with various options for responders, IP ranges, custom messages, and URLs.

github

: 333

rpaframework

RPA Framework is an open-source collection of libraries and tools for Robotic Process Automation (RPA), designed to be used with Robot Framework and Python. It offers well-documented core libraries for Software Robot Developers, optimized for Robocorp Control Room and Developer Tools, and accepts external contributions. The project includes various libraries for tasks like archiving, browser automation, date/time manipulations, cloud services integration, encryption operations, database interactions, desktop automation, document processing, email operations, Excel manipulation, file system operations, FTP interactions, web API interactions, image manipulation, AI services, and more. The development of the repository is Python-based and requires Python version 3.8+, with tooling based on poetry and invoke for compiling, building, and running the package. The project is licensed under the Apache License 2.0.

github

: 1.1k

one

ONE is a modern web and AI agent development toolkit that empowers developers to build AI-powered applications with high performance, beautiful UI, AI integration, responsive design, type safety, and great developer experience. It is perfect for building modern web applications, from simple landing pages to complex AI-powered platforms.

github

: 58

LLMTSCS

LLMLight is a novel framework that employs Large Language Models (LLMs) as decision-making agents for Traffic Signal Control (TSC). The framework leverages the advanced generalization capabilities of LLMs to engage in a reasoning and decision-making process akin to human intuition for effective traffic control. LLMLight has been demonstrated to be remarkably effective, generalizable, and interpretable against various transportation-based and RL-based baselines on nine real-world and synthetic datasets.

github

: 173

cortex.cpp

Cortex.cpp is an open-source platform designed as the brain for robots, offering functionalities such as vision, speech, language, tabular data processing, and action. It provides an AI platform for running AI models with multi-engine support, hardware optimization with automatic GPU detection, and an OpenAI-compatible API. Users can download models from the Hugging Face model hub, run models, manage resources, and access advanced features like multiple quantizations and engine management. The tool is under active development, promising rapid improvements for users.

github

: 2.6k

CrewAI-GUI

CrewAI-GUI is a Node-Based Frontend tool designed to revolutionize AI workflow creation. It empowers users to design complex AI agent interactions through an intuitive drag-and-drop interface, export designs to JSON for modularity and reusability, and supports both GPT-4 API and Ollama for flexible AI backend. The tool ensures cross-platform compatibility, allowing users to create AI workflows on Windows, Linux, or macOS efficiently.

github

: 88

AI-Agent-Starter-Kit

AI Agent Starter Kit is a modern full-stack AI-enabled template using Next.js for frontend and Express.js for backend, with Telegram and OpenAI integrations. It offers AI-assisted development, smart environment variable setup assistance, intelligent error resolution, context-aware code completion, and built-in debugging helpers. The kit provides a structured environment for developers to interact with AI tools seamlessly, enhancing the development process and productivity.

github

: 147

ps-fuzz

The Prompt Fuzzer is an open-source tool that helps you assess the security of your GenAI application's system prompt against various dynamic LLM-based attacks. It provides a security evaluation based on the outcome of these attack simulations, enabling you to strengthen your system prompt as needed. The Prompt Fuzzer dynamically tailors its tests to your application's unique configuration and domain. The Fuzzer also includes a Playground chat interface, giving you the chance to iteratively improve your system prompt, hardening it against a wide spectrum of generative AI attacks.

github

: 367

For similar tasks

paperless-gpt

github

: 724

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

AI-in-a-Box

AI-in-a-Box is a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction, while maintaining the highest standards of quality and efficiency. It provides essential guidance on the responsible use of AI and LLM technologies, specific security guidance for Generative AI (GenAI) applications, and best practices for scaling OpenAI applications within Azure. The available accelerators include: Azure ML Operationalization in-a-box, Edge AI in-a-box, Doc Intelligence in-a-box, Image and Video Analysis in-a-box, Cognitive Services Landing Zone in-a-box, Semantic Kernel Bot in-a-box, NLP to SQL in-a-box, Assistants API in-a-box, and Assistants API Bot in-a-box.

github

: 527

langchain-rust

LangChain Rust is a library for building applications with Large Language Models (LLMs) through composability. It provides a set of tools and components that can be used to create conversational agents, document loaders, and other applications that leverage LLMs. LangChain Rust supports a variety of LLMs, including OpenAI, Azure OpenAI, Ollama, and Anthropic Claude. It also supports a variety of embeddings, vector stores, and document loaders. LangChain Rust is designed to be easy to use and extensible, making it a great choice for developers who want to build applications with LLMs.

github

: 722

dolma

Dolma is a dataset and toolkit for curating large datasets for (pre)-training ML models. The dataset consists of 3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials. The toolkit provides high-performance, portable, and extensible tools for processing, tagging, and deduplicating documents. Key features of the toolkit include built-in taggers, fast deduplication, and cloud support.

github

: 1.0k

sparrow

Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. It seamlessly handles forms, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services and pipelines all optimized for robust performance. One of the critical functionalities of Sparrow - pluggable architecture. You can easily integrate and run data extraction pipelines using tools and frameworks like LlamaIndex, Haystack, or Unstructured. Sparrow enables local LLM data extraction pipelines through Ollama or Apple MLX. With Sparrow solution you get API, which helps to process and transform your data into structured output, ready to be integrated with custom workflows. Sparrow Agents - with Sparrow you can build independent LLM agents, and use API to invoke them from your system. **List of available agents:** * **llamaindex** - RAG pipeline with LlamaIndex for PDF processing * **vllamaindex** - RAG pipeline with LLamaIndex multimodal for image processing * **vprocessor** - RAG pipeline with OCR and LlamaIndex for image processing * **haystack** - RAG pipeline with Haystack for PDF processing * **fcall** - Function call pipeline * **unstructured-light** - RAG pipeline with Unstructured and LangChain, supports PDF and image processing * **unstructured** - RAG pipeline with Weaviate vector DB query, Unstructured and LangChain, supports PDF and image processing * **instructor** - RAG pipeline with Unstructured and Instructor libraries, supports PDF and image processing. Works great for JSON response generation

github

: 4.5k

Open-DocLLM

Open-DocLLM is an open-source project that addresses data extraction and processing challenges using OCR and LLM technologies. It consists of two main layers: OCR for reading document content and LLM for extracting specific content in a structured manner. The project offers a larger context window size compared to JP Morgan's DocLLM and integrates tools like Tesseract OCR and Mistral for efficient data analysis. Users can run the models on-premises using LLM studio or Ollama, and the project includes a FastAPI app for testing purposes.

github

: 124

aws-genai-llm-chatbot

This repository provides code to deploy a chatbot powered by Multi-Model and Multi-RAG using AWS CDK on AWS. Users can experiment with various Large Language Models and Multimodal Language Models from different providers. The solution supports Amazon Bedrock, Amazon SageMaker self-hosted models, and third-party providers via API. It also offers additional resources like AWS Generative AI CDK Constructs and Project Lakechain for building generative AI solutions and document processing. The roadmap and authors are listed, along with contributors. The library is licensed under the MIT-0 License with information on changelog, code of conduct, and contributing guidelines. A legal disclaimer advises users to conduct their own assessment before using the content for production purposes.

github

: 1.2k

For similar jobs

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

daily-poetry-image

Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.

github

: 492

exif-photo-blog

EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.

github

: 992

SillyTavern

SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.

github

: 13.2k

Twitter-Insight-LLM

This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).

github

: 401

AISuperDomain

Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.

github

: 1.2k

ChatGPT-On-CS

This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.

github

: 768

obs-localvocal

LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.

github

: 248

paperless-gpt

README:

paperless-gpt

Key Highlights

Table of Contents

Getting Started

Prerequisites

Installation

Docker Compose

Manual Setup

OCR Providers

1. LLM-based OCR (Default)

2. Azure Document Intelligence

3. Google Document AI

Configuration

Environment Variables

Note: When using Ollama, ensure that the Ollama server is running and accessible from the paperless-gpt container.

Custom Prompt Templates

Template Variables

Usage

LLM-Based OCR: Compare for Yourself

Example 1

Example 2

How It Works

Troubleshooting

Working with Local LLMs

Token Management

Contributing

Support the Project

License

Star History

Disclaimer

For Tasks:

For Jobs:

Alternative AI tools for paperless-gpt

Similar Open Source Tools

paperless-gpt

VimLM

evalchemy

quantalogic

aicommit2

local-deep-research

paelladoc

mistral.rs

caddy-defender

rpaframework

one

LLMTSCS

cortex.cpp

CrewAI-GUI

AI-Agent-Starter-Kit

ps-fuzz

For similar tasks

paperless-gpt

classifai

AI-in-a-Box

langchain-rust

dolma

sparrow

Open-DocLLM

aws-genai-llm-chatbot

For similar jobs

LLMStack

daily-poetry-image

exif-photo-blog

SillyTavern

Twitter-Insight-LLM

AISuperDomain

ChatGPT-On-CS

obs-localvocal