paperless-ai

paperless-ai

An automated document analyzer for Paperless-ngx using OpenAI API and Ollama (Mistral, llama, phi 3, gemma 2) to automatically analyze and tag your documents.

Stars: 223

Visit
 screenshot

Paperless-AI is an automated document analyzer tool designed for Paperless-ngx users. It utilizes the OpenAI API and Ollama (Mistral, llama, phi 3, gemma 2) to automatically scan, analyze, and tag documents. The tool offers features such as automatic document scanning, AI-powered document analysis, automatic title and tag assignment, manual mode for analyzing documents, easy setup through a web interface, document processing dashboard, error handling, and Docker support. Users can configure the tool through a web interface and access a debug interface for monitoring and troubleshooting. Paperless-AI aims to streamline document organization and analysis processes for users with access to Paperless-ngx and AI capabilities.

README:

GitHub commit activity Docker Pulls GitHub User's stars GitHub License

Paperless-AI

An automated document analyzer for Paperless-ngx using OpenAI API and Ollama (Mistral, llama, phi 3, gemma 2) to automatically analyze and tag your documents.

Features

  • 🔍 Automatic document scanning in Paperless-ngx

  • 🤖 AI-powered document analysis using OpenAI API and Ollama (Mistral, llama, phi 3, gemma 2)

  • 🏷️ Automatic title, tag and correspondent assignment

    • 🏷️ Predefine what documents will be processed based on existing tags (optional). 🆕
    • 📑 Choose to only use Tags you want to be assigned. 🆕
      • THIS WILL DISABLE THE PROMPT DIALOG!
    • ✔️ Choose if you want to assign a special tag (you name it) to documents that were processed by AI. 🆕
  • 🔨 Manual mode to do analysing by hand with help of AI. 🆕

  • 🚀 Easy setup through web interface

  • 📊 Document processing dashboard

  • 🔄 Automatic restart and health monitoring

  • 🛡️ Error handling and graceful shutdown

  • 🐳 Docker support with health checks

Prerequisites

  • Docker and Docker Compose
  • Access to a Paperless-ngx installation
  • OpenAI API key or your own Ollama instance with your chosen model running and reachable.
  • Basic understanding of cron syntax (for scan interval configuration)

Installation

You can use the easy way

docker run -d --name paperless-ai --network bridge -v paperless-ai_data:/app/data -p 3000:3000 --restart unless-stopped clusterzx/paperless-ai

Or you can do it manually by yourself:

  1. Clone the repository:
git clone https://github.com/clusterzx/paperless-ai.git
cd paperless-ai
npm install
  1. Start the container:
docker-compose up -d
  1. Open your browser and navigate to:
http://localhost:3000
  1. Complete the setup by providing:
  • Paperless-ngx API URL
  • Paperless-ngx API Token
  • Ollama API Data OR
  • OpenAI API Key
  • Scan interval (default: every 30 minutes)

How it Works

  1. Document Discovery

    • Periodically scans Paperless-ngx for new documents
    • Tracks processed documents in a local SQLite database
  2. AI Analysis

    • Sends document content to OpenAI API or Ollama for analysis
    • Extracts relevant tags and correspondent information
    • Uses GPT-4o-mini or your custom Ollama model for accurate document understanding
  3. Automatic Organization

    • Creates new tags if they don't exist
    • Creates new correspondents if they don't exist
    • Updates documents with analyzed information
    • Marks documents as processed to avoid duplicate analysis

NEW! Manual Mode

You can now manually analyze your files by hand with the help of AI in a beautiful Webinterface. Reachable via the /manual endpoint from the webinterface.

NEW Dashboard:

Dashboard Image

Configuration Options

The application can be configured through the Webinterface on the /setup Route. You dont need/can't set the environment vars through docker.

Setup Image

Docker Support

The application comes with full Docker support:

  • Automatic container restart on failure
  • Health monitoring
  • Volume persistence for database
  • Resource management
  • Graceful shutdown handling

Docker Commands

# Start the container
docker-compose up -d

# View logs
docker-compose logs -f

# Restart container
docker-compose restart

# Stop container
docker-compose down

# Rebuild and start
docker-compose up -d --build

Health Checks

The application provides a health check endpoint at /health that returns:

# Healthy system
{
  "status": "healthy"
}

# System not configured
{
  "status": "not_configured",
  "message": "Application setup not completed"
}

# Database error
{
  "status": "database_error",
  "message": "Database check failed"
}

Debug Interface

The application includes a debug interface accessible via /debug that helps administrators monitor and troubleshoot the system's data:

  • 🔍 View all system tags
  • 📄 Inspect processed documents
  • 👥 Review correspondent information

Accessing the Debug Interface

  1. Navigate to:
http://your-instance:3000/debug
  1. The interface provides:
    • Interactive dropdown to select data category
    • Tree view visualization of JSON responses
    • Color-coded data representation
    • Collapsible/expandable data nodes

Available Debug Endpoints

Endpoint Description
/debug/tags Lists all tags in the system
/debug/documents Shows processed document information
/debug/correspondents Displays correspondent data

Health Check Integration

The debug interface also integrates with the health check system, showing a configuration warning if the system is not properly set up.

Development

To run the application locally without Docker:

  1. Install dependencies:
npm install
  1. Start the development server:
npm run test

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Security Considerations

  • Store API keys securely
  • Restrict container access
  • Monitor API usage
  • Regularly update dependencies
  • Back up your database

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Paperless-ngx for the amazing document management system
  • OpenAI API
  • The Express.js and Node.js communities for their excellent tools

Support

If you encounter any issues or have questions:

  1. Check the Issues section
  2. Create a new issue if yours isn't already listed
  3. Provide detailed information about your setup and the problem

Roadmap

  • [x] Support for custom AI models
  • [x] Support for multiple language analysis
  • [x] Advanced tag matching algorithms
  • [ ] Custom rules for document processing
  • [ ] Enhanced web interface with statistics

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for paperless-ai

Similar Open Source Tools

For similar tasks

For similar jobs