LLavaImageTagger

LLavaImageTagger

Creates an index of images, queries a local LLM and adds tags to the image metadata

Stars: 97

Visit
 screenshot

LLMImageIndexer is an intelligent image processing and indexing tool that leverages local AI to generate comprehensive metadata for your image collection. It uses advanced language models to analyze images and generate captions and keyword metadata. The tool offers features like intelligent image analysis, metadata enhancement, local processing, multi-format support, user-friendly GUI, GPU acceleration, cross-platform support, stop and start capability, and keyword post-processing. It operates directly on image file metadata, allowing users to manage files, add new files, and run the tool multiple times without reprocessing previously keyworded files. Installation instructions are provided for Windows, macOS, and Linux platforms, along with usage guidelines and configuration options.

README:

LLMImageIndexer

License: MIT

LLMImageIndexer is an intelligent image processing and indexing tool that leverages local AI to generate comprehensive metadata for your image collection. This tool uses advanced language models to analyze images and generate captions and keyword metadata.

Screenshot

Features

  • Intelligent Image Analysis: Utilizes a local AI model to generate a variable number of keywords and a caption for each image.
  • Metadata Enhancement: Can automatically edit image metadata with generated tags.
  • Local Processing: All processing is done locally on your machine.
  • Multi-Format Support: Handles a wide range of image formats, including all major raw camera files.
  • User-Friendly GUI: Includes a GUI and installer. Relies on Koboldcpp, a single executable, for all AI functionality.
  • GPU Acceleration: Will use Apple Metal, Nvidia CUDA, or AMD (Vulkan) hardware if available to greatly speed inference.
  • Cross-Platform: Supports Windows, macOS ARM, and Linux.
  • Stop and Start Capability: Can stop and start without having to reprocess all the files again.
  • Keyword Post-Processing: Expand keywords so all synonyms are added to every image with one of the synonyms, or deduplicate keywords by using the most frequently used synonym in place of all matching synonyms.

Important Information

Before proceeding to use this script you should be aware of the following:

This tool operates directly on image file metadata. When 'pretend mode' is not checked it will write to one or more of the following fields:

  • MWG:Keywords, which may write to any of the following fields:

    1. DC:Subject
    2. IPTC: Keywords
    3. Composite: Keywords
  • XMP: Description, for caption

  • XMP: Status, for processing state

  • XMP:Identifier, for unique identifier

The "Status" and "Identifier" fields are used to track the processing state of images so that a database is not needed. The images may be moved between directories or even renamed, and if the "Skip images already processed but not in database" box is checked, they will not be processed again.

This means you can manage your files and add new files, and run the tool as many times as you like without worrying about reprocessing the files that were previously keyworded by the tool.

Installation

Prerequisites

  • Python 3.8 or higher
  • KoboldCPP

A vision model is needed, but if you use the llmii-run.bat to open it, then the first time it is run it will download the Qwen2-VL 7B Q4_K_M gguf and F16 projector from Bartowski's repo on huggingface. If you don't want to use that, just open llmii-no-kobold.bat instead and open Koboldcpp.exe and load whatever model you like.

Windows Installation

  1. Clone the repository or download the ZIP file and extract it.

  2. Install Python for Windows.

  3. Download KoboldCPP.exe and place it in the LlavaImageTagger folder. If it is not named KoboldCPP.exe, rename it to KoboldCPP.exe

  4. Run llmii-run.bat and wait exiftool to install. When it is complete you must start the file again. If you called it from a terminal window you will need to close the windows and reopen it. It will then create a python environment and download the model weights. The download is quite large (6GB) and there is no progress bar, but it only needs to do this once. Once it is done KoboldCPP will start and one of the terminal windows will say Please connect to custom endpoint at http://localhost:5001 and then it is ready.

macOS Installation (including ARM)

  1. Clone the repository or download the ZIP file and extract it.

  2. Install Python 3.7 or higher if not already installed. You can use Homebrew:

    brew install python
    
  3. Install ExifTool:

    brew install exiftool
    
  4. Download KoboldCPP for macOS and place it in the LLMImageIndexer folder.

  5. Open a terminal in the LLMImageIndexer folder and run:

    xattr -cr koboldcpp-mac-arm64
    chmod +x koboldcpp-mac-arm64
    ./llmii-run.sh
    

Linux Installation

  1. Clone the repository or download and extract the ZIP file.

  2. Install Python 3.7 or higher if not already installed. Use your distribution's package manager, for example on Ubuntu:

    sudo apt-get update
    sudo apt-get install python3 python3-pip
    
  3. Install ExifTool. On Ubuntu:

    sudo apt-get install libimage-exiftool-perl
    
  4. Download the appropriate KoboldCPP binary for your Linux distribution from KoboldCPP releases and place it in the LLMImageIndexer folder.

  5. Open a terminal in the LLMImageIndexer folder and run:

    chmod +x koboldcpp-linux-x64
    ./llmii-run.sh
    

For all platforms, the script will set up the Python environment, install dependencies, and download necessary model weights (6GB total). This initial setup is performed only once and will take a few minutes depending on your download speed.

Usage

  1. Launch the LLMImageIndexer GUI:

    • On Windows: Run llmii-run.bat
    • On macOS/Linux: Run python3 llmii-gui.py
  2. Ensure KoboldCPP is running. Wait until you see the following message in the KoboldCPP window:

    Please connect to custom endpoint at http://localhost:5001
    
  3. Configure the indexing settings in the GUI:

    • Select the target image directory
    • Set the API URL (default: http://localhost:5001)
    • Choose metadata tags to generate (keywords, descriptions)
    • Set additional options (crawl subdirectories, backup files, etc.)
  4. Click "Run Image Indexer" to start the process.

  5. Monitor the progress in the output area of the GUI.

Configuration Options

  • Directory: Target image directory (includes subdirectories by default)
  • API URL: KoboldCPP API endpoint (change if running on another machine)
  • API Password: Set if required by your KoboldCPP setup
  • Caption: Have the LLM describe the image and set it in XMP:Description (doubles processing time)
  • GenTokens: Amount of tokens for the LLM to generate
  • Skip processed files not in database: Will not attempt to reprocess files with a UUID and keywords even if they are not in the llmii.json database
  • Reprocess failed: If any files failed in the last round, it will try to process them again
  • Reprocess ALL: The files that are processed already are stored in a database and skipped if you resume later, this will do them all over again
  • Don't crawl subdirectories: Disable scanning of subdirectories
  • Don't make backups before writing: Skip creating backup files (NOTE: this applies to processing and post-processing; if you enable post processing and leave this unchecked it will make a second backup!)
  • Pretend mode: Simulate processing without writing to files or database
  • Skip processing: If you want to use the keyword processing and don't want it to check every image in the directory before starting, check this box
  • Keywords: Choose to clear and write new keywords or update existing ones
  • Post processing keywords: Keep keywords as generated, Expand keywords by applying all synonyms to matching keywords, or Dedepe keywords by replacing matching synonyms with most frequent synonym; these option take place after the indexer has completed unless the 'skip processing' box is checked

More Information and Troubleshooting

Consult the wiki for more information and troubleshooting steps.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for LLavaImageTagger

Similar Open Source Tools

For similar tasks

For similar jobs