LLMOCR

LLMOCR

Simple script that reads an image and dumps the text it reads using a vision model and KobolodCPP

Stars: 53

Visit
 screenshot

LLMOCR is a tool that utilizes a local Large Language Model (LLM) to extract text from images. It offers a user-friendly GUI and supports GPU acceleration for faster inference. The tool is cross-platform, compatible with Windows, macOS ARM, and Linux. Users can prompt the LLM to process images in a customized way. The processing is done locally on the user's machine, ensuring data privacy and security. LLMOCR requires Python 3.8 or higher and KoboldCPP for installation and operation.

README:

LLMOCR

License: MIT

LLMOCR uses a local LLM to read text from images.

You can also change the instruction to have the LLM use the image in the way that you prompt.

Screenshot

Features

  • Local Processing: All processing is done locally on your machine.
  • User-Friendly GUI: Includes a GUI. Relies on Koboldcpp, a single executable, for all AI functionality.
  • GPU Acceleration: Will use Apple Metal, Nvidia CUDA, or AMD (Vulkan) hardware if available to greatly speed inference.
  • Cross-Platform: Supports Windows, macOS ARM, and Linux.

Installation

Prerequisites

  • Python 3.8 or higher
  • KoboldCPP

Windows Installation

  1. Clone the repository or download the ZIP file and extract it.

  2. Install Python for Windows.

  3. Download KoboldCPP.exe and place it in the LLMOCR folder. If it is not named KoboldCPP.exe, rename it to KoboldCPP.exe

  4. If you want the script to download a model for you and have KoboldCpp run it for you, open llm_ocr.bat

  5. If you want to load your own model using KoboldCpp, open llm_ocr_no_kobold.bat

Mac and Linux Installation

  1. Clone the repository or download and extract the ZIP file.

  2. Install Python 3.8 or higher if not already installed.

  3. Create a new python env and install the requirements.txt.

  4. Run kobold with flag --config llm-ocr.kcppt

  5. Wait until the model weights finish downloading and the terminal window says Please connect to custom endpoint at http://localhost:5001

  6. Run llm-ocr-gui.py using Python.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for LLMOCR

Similar Open Source Tools

For similar tasks

For similar jobs