NekoImageGallery

An AI-powered natural language & reverse Image Search Engine powered by CLIP & qdrant.

Stars: 97

Visit

NekoImageGallery is an online AI image search engine that utilizes the Clip model and Qdrant vector database. It supports keyword search and similar image search. The tool generates 768-dimensional vectors for each image using the Clip model, supports OCR text search using PaddleOCR, and efficiently searches vectors using the Qdrant vector database. Users can deploy the tool locally or via Docker, with options for metadata storage using Qdrant database or local file storage. The tool provides API documentation through FastAPI's built-in Swagger UI and can be used for tasks like image search, text extraction, and vector search.

README:

NekoImageGallery

An online AI image search engine based on the Clip model and Qdrant vector database. Supports keyword search and similar image search.

中文文档

✨ Features

Use the Clip model to generate 768-dimensional vectors for each image as the basis for search. No need for manual annotation or classification, unlimited classification categories.
OCR Text search is supported, use PaddleOCR to extract text from images and use BERT to generate text vectors for search.
Use Qdrant vector database for efficient vector search.

📷Screenshots

The above screenshots may contain copyrighted images from different artists, please do not use them for other purposes.

✈️ Deployment

📦 Prerequisites

Hardware requirements

HardWare	Minimum	Recommended
CPU	X86_64 or ARM64 CPU, 2 cores or more	4 cores or more
RAM	4GB or more	8GB or more
Storage	10GB or more for libraries, models, and datas	50GB or more, SSD is recommended
GPU	Not required	CUDA supported GPU for acceleration, 4GB of VRAM or more

Software requirements

For local deployment: Python 3.10 ~ Python 3.12 (With virtual environment support like venv, conda, etc.)
For Docker deployment: Docker and Docker Compose (For CUDA users, nvidia-container-runtime is required) or equivalent container runtime.

🖥️ Local Deployment

Choose a metadata storage method

Qdrant Database (Recommended)

In most cases, we recommend using the Qdrant database to store metadata. The Qdrant database provides efficient retrieval performance, flexible scalability, and better data security.

Please deploy the Qdrant database according to the Qdrant documentation. It is recommended to use Docker for deployment.

If you don't want to deploy Qdrant yourself, you can use the online service provided by Qdrant.

Local File Storage

Local file storage directly stores image metadata (including feature vectors, etc.) in a local SQLite database. It is only recommended for small-scale deployments or development deployments.

Local file storage does not require an additional database deployment process, but has the following disadvantages:

Local storage does not index and optimize vectors, so the time complexity of all searches is O(n). Therefore, if the data scale is large, the performance of search and indexing will decrease.
Using local file storage will make NekoImageGallery stateful, so it will lose horizontal scalability.
When you want to migrate to Qdrant database for storage, the indexed metadata may be difficult to migrate directly.

Deploy NekoImageGallery

Clone the project directory to your own PC or server, then checkout to a specific version tag (like v1.0.0).
It is highly recommended to install the dependencies required for this project in a Python venv virtual environment. Run the following command:
```
python -m venv .venv
. .venv/bin/activate
```
Install PyTorch. Follow the PyTorch documentation to install the torch version suitable for your system using pip.

If you want to use CUDA acceleration for inference, be sure to install a CUDA-supported PyTorch version in this step. After installation, you can use torch.cuda.is_available() to confirm whether CUDA is available.
Install other dependencies required for this project:
```
pip install -r requirements.txt
```
Modify the project configuration file inside config/, you can edit default.env directly, but it's recommended to create a new file named local.env and override the configuration in default.env.
Run this application:
```
python main.py
```
You can use --host to specify the IP address you want to bind to (default is 0.0.0.0) and --port to specify the port you want to bind to (default is 8000).
You can see all available commands and options by running python main.py --help.
(Optional) Deploy the front-end application: NekoImageGallery.App is a simple web front-end application for this project. If you want to deploy it, please refer to its deployment documentation.

🐋 Docker Deployment

About docker images

NekoImageGallery's docker image are built and released on Docker Hub, including serval variants:

Tags	Description	Latest Image Size
`edgeneko/neko-image-gallery:<version>` `edgeneko/neko-image-gallery:<version>-cuda` `edgeneko/neko-image-gallery:<version>-cuda12.1`	Supports GPU inferencing with CUDA12.1
`edgeneko/neko-image-gallery:<version>-cuda11.8`	Supports GPU inferencing with CUDA11.8
`edgeneko/neko-image-gallery:<version>-cpu`	Supports CPU inferencing
`edgeneko/neko-image-gallery:<version>-cpu-arm`	(Alpha) Supports CPU inferencing on ARM64(aarch64) devices

Where <version> is the version number or version alias of NekoImageGallery, as follows:

Version	Description
`latest`	The latest stable version of NekoImageGallery
`v..` / `v.*`	The specific version number (correspond to Git tags)
`edge`	The latest development version of NekoImageGallery, may contain unstable features and breaking changes

In each image, we have bundled the necessary dependencies, openai/clip-vit-large-patch14 model weights, bert-base-chinese model weights and easy-paddle-ocr models to provide a complete and ready-to-use image.

The images uses /opt/NekoImageGallery/static as volume to store image files, mount it to your own volume or directory if local storage is required.

For configuration, we suggest using environment variables to override the default configuration. Secrets (like API tokens) can be provided by docker secrets.

Prepare `nvidia-container-runtime` (CUDA users only)

If you want to use CUDA acceleration, you need to install nvidia-container-runtime on your system. Please refer to the official documentation for installation.

Related Document:

https://docs.docker.com/config/containers/resource_constraints/#gpu

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker

https://nvidia.github.io/nvidia-container-runtime/

Run the server

Download the docker-compose.yml file from repository.

# For cuda deployment (default)
wget https://raw.githubusercontent.com/hv0905/NekoImageGallery/master/docker-compose.yml
# For CPU-only deployment
wget https://raw.githubusercontent.com/hv0905/NekoImageGallery/master/docker-compose-cpu.yml && mv docker-compose-cpu.yml docker-compose.yml

Modify the docker-compose.yml file as needed

Run the following command to start the server:

# start in foreground
docker compose up
# start in background(detached mode)
docker compose up -d

Upload images to NekoImageGallery

There are serval ways to upload images to NekoImageGallery

Through the web interface: You can use the web interface to upload images to the server. The web interface is provided by NekoImageGallery.App. Make sure you have enabled the Admin API and set your Admin Token in the configuration file.
Through local indexing: This is suitable for local deployment or when the images you want to upload are already on the server. Use the following command to index your local image directory:
```
 python main.py local-index <path-to-your-image-directory>
```
The above command will recursively upload all images in the specified directory and its subdirectories to the server. You can also specify categories/starred for images you upload, see python main.py local-index --help for more information.
Through the API: You can use the upload API provided by NekoImageGallery to upload images. By using this method, the server can prevent saving the image files locally but only store their URLs and metadata.
Make sure you have enabled the Admin API and set your Admin Token in the configuration file.
This method is suitable for automated image uploading or sync NekoImageGallery with external systems. Checkout API documentation for more information.

📚 API Documentation

The API documentation is provided by FastAPI's built-in Swagger UI. You can access the API documentation by visiting the /docs or /redoc path of the server.

⚡ Related Project

Those project works with NekoImageGallery :D

📊 Repository Summary

♥ Contributing

There are many ways to contribute to the project: logging bugs, submitting pull requests, reporting issues, and creating suggestions.

Even if you with push access on the repository, you should create a personal feature branches when you need them. This keeps the main repository clean and your workflow cruft out of sight.

We're also interested in your feedback on the future of this project. You can submit a suggestion or feature request through the issue tracker. To make this process more effective, we're asking that these include more information to help define them more clearly.

Copyright

Licensed under AGPLv3 license.

For Tasks:

Click tags to check more tools for each tasks

search images extract text index images create thumbnails deploy application

For Jobs:

data scientist machine learning engineer computer vision engineer ai researcher software developer

Alternative AI tools for NekoImageGallery

Similar Open Source Tools

NekoImageGallery

github

: 97

SillyTavern

SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.

github

: 13.2k

TaxHacker

github

: 230

axoned

Axone is a public dPoS layer 1 designed for connecting, sharing, and monetizing resources in the AI stack. It is an open network for collaborative AI workflow management compatible with any data, model, or infrastructure, allowing sharing of data, algorithms, storage, compute, APIs, both on-chain and off-chain. The 'axoned' node of the AXONE network is built on Cosmos SDK & Tendermint consensus, enabling companies & individuals to define on-chain rules, share off-chain resources, and create new applications. Validators secure the network by maintaining uptime and staking $AXONE for rewards. The blockchain supports various platforms and follows Semantic Versioning 2.0.0. A docker image is available for quick start, with documentation on querying networks, creating wallets, starting nodes, and joining networks. Development involves Go and Cosmos SDK, with smart contracts deployed on the AXONE blockchain. The project provides a Makefile for building, installing, linting, and testing. Community involvement is encouraged through Discord, open issues, and pull requests.

github

: 168

AgentBench

AgentBench is a benchmark designed to evaluate Large Language Models (LLMs) as autonomous agents in various environments. It includes 8 distinct environments such as Operating System, Database, Knowledge Graph, Digital Card Game, and Lateral Thinking Puzzles. The tool provides a comprehensive evaluation of LLMs' ability to operate as agents by offering Dev and Test sets for each environment. Users can quickly start using the tool by following the provided steps, configuring the agent, starting task servers, and assigning tasks. AgentBench aims to bridge the gap between LLMs' proficiency as agents and their practical usability.

github

: 2.1k

Whisper-WebUI

Whisper-WebUI is a Gradio-based browser interface for Whisper, serving as an Easy Subtitle Generator. It supports generating subtitles from various sources such as files, YouTube, and microphone. The tool also offers speech-to-text and text-to-text translation features, utilizing Facebook NLLB models and DeepL API. Users can translate subtitle files from other languages to English and vice versa. The project integrates faster-whisper for improved VRAM usage and transcription speed, providing efficiency metrics for optimized whisper models. Additionally, users can choose from different Whisper models based on size and language requirements.

github

: 1.8k

OpenLLM

OpenLLM is a platform that helps developers run any open-source Large Language Models (LLMs) as OpenAI-compatible API endpoints, locally and in the cloud. It supports a wide range of LLMs, provides state-of-the-art serving and inference performance, and simplifies cloud deployment via BentoML. Users can fine-tune, serve, deploy, and monitor any LLMs with ease using OpenLLM. The platform also supports various quantization techniques, serving fine-tuning layers, and multiple runtime implementations. OpenLLM seamlessly integrates with other tools like OpenAI Compatible Endpoints, LlamaIndex, LangChain, and Transformers Agents. It offers deployment options through Docker containers, BentoCloud, and provides a community for collaboration and contributions.

github

: 10.9k

dify-google-cloud-terraform

This repository provides Terraform configurations to automatically set up Google Cloud resources and deploy Dify in a highly available configuration. It includes features such as serverless hosting, auto-scaling, and data persistence. Users need a Google Cloud account, Terraform, and gcloud CLI installed to use this tool. The configuration involves setting environment-specific values and creating a GCS bucket for managing Terraform state. The tool allows users to initialize Terraform, create Artifact Registry repository, build and push container images, plan and apply Terraform changes, and cleanup resources when needed.

github

: 72

vim-ollama

The 'vim-ollama' plugin for Vim adds Copilot-like code completion support using Ollama as a backend, enabling intelligent AI-based code completion and integrated chat support for code reviews. It does not rely on cloud services, preserving user privacy. The plugin communicates with Ollama via Python scripts for code completion and interactive chat, supporting Vim only. Users can configure LLM models for code completion tasks and interactive conversations, with detailed installation and usage instructions provided in the README.

github

: 147

ai-toolkit

The AI Toolkit by Ostris is a collection of tools for machine learning, specifically designed for image generation, LoRA (latent representations of attributes) extraction and manipulation, and model training. It provides a user-friendly interface and extensive documentation to make it accessible to both developers and non-developers. The toolkit is actively under development, with new features and improvements being added regularly. Some of the key features of the AI Toolkit include: - Batch Image Generation: Allows users to generate a batch of images based on prompts or text files, using a configuration file to specify the desired settings. - LoRA (lierla), LoCON (LyCORIS) Extractor: Facilitates the extraction of LoRA and LoCON representations from pre-trained models, enabling users to modify and manipulate these representations for various purposes. - LoRA Rescale: Provides a tool to rescale LoRA weights, allowing users to adjust the influence of specific attributes in the generated images. - LoRA Slider Trainer: Enables the training of LoRA sliders, which can be used to control and adjust specific attributes in the generated images, offering a powerful tool for fine-tuning and customization. - Extensions: Supports the creation and sharing of custom extensions, allowing users to extend the functionality of the toolkit with their own tools and scripts. - VAE (Variational Auto Encoder) Trainer: Facilitates the training of VAEs for image generation, providing users with a tool to explore and improve the quality of generated images. The AI Toolkit is a valuable resource for anyone interested in exploring and utilizing machine learning for image generation and manipulation. Its user-friendly interface, extensive documentation, and active development make it an accessible and powerful tool for both beginners and experienced users.

github

: 4.4k

docetl

DocETL is a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks. It offers a low-code, declarative YAML interface to define LLM-powered operations on complex data. Ideal for maximizing correctness and output quality for semantic processing on a collection of data, representing complex tasks via map-reduce, maximizing LLM accuracy, handling long documents, and automating task retries based on validation criteria.

github

: 1.5k

Upscaler

Holloway's Upscaler is a consolidation of various compiled open-source AI image/video upscaling products for a CLI-friendly image and video upscaling program. It provides low-cost AI upscaling software that can run locally on a laptop, programmable for albums and videos, reliable for large video files, and works without GUI overheads. The repository supports hardware testing on various systems and provides important notes on GPU compatibility, video types, and image decoding bugs. Dependencies include ffmpeg and ffprobe for video processing. The user manual covers installation, setup pathing, calling for help, upscaling images and videos, and contributing back to the project. Benchmarks are provided for performance evaluation on different hardware setups.

github

: 262

anyquery

Anyquery is a SQL query engine built on SQLite that allows users to run SQL queries on various data sources like files, databases, and apps. It can connect to LLMs to access data and act as a MySQL server for running queries. The tool is extensible through plugins and supports various installation methods like Homebrew, APT, YUM/DNF, Scoop, Winget, and Chocolatey.

github

: 633

llm-foundry

LLM Foundry is a codebase for training, finetuning, evaluating, and deploying LLMs for inference with Composer and the MosaicML platform. It is designed to be easy-to-use, efficient _and_ flexible, enabling rapid experimentation with the latest techniques. You'll find in this repo: * `llmfoundry/` - source code for models, datasets, callbacks, utilities, etc. * `scripts/` - scripts to run LLM workloads * `data_prep/` - convert text data from original sources to StreamingDataset format * `train/` - train or finetune HuggingFace and MPT models from 125M - 70B parameters * `train/benchmarking` - profile training throughput and MFU * `inference/` - convert models to HuggingFace or ONNX format, and generate responses * `inference/benchmarking` - profile inference latency and throughput * `eval/` - evaluate LLMs on academic (or custom) in-context-learning tasks * `mcli/` - launch any of these workloads using MCLI and the MosaicML platform * `TUTORIAL.md` - a deeper dive into the repo, example workflows, and FAQs

github

: 4.1k

inspector-laravel

Inspector is a code execution monitoring tool specifically designed for Laravel applications. It provides simple and efficient monitoring capabilities to track and analyze the performance of your Laravel code. With Inspector, you can easily monitor web requests, test the functionality of your application, and explore data through a user-friendly dashboard. The tool requires PHP version 7.2.0 or higher and Laravel version 5.5 or above. By configuring the ingestion key and attaching the middleware, users can seamlessly integrate Inspector into their Laravel projects. The official documentation provides detailed instructions on installation, configuration, and usage of Inspector. Contributions to the tool are welcome, and users are encouraged to follow the Contribution Guidelines to participate in the development of Inspector.

github

: 197

air-light

Air-light is a minimalist WordPress starter theme designed to be an ultra minimal starting point for a WordPress project. It is built to be very straightforward, backwards compatible, front-end developer friendly and modular by its structure. Air-light is free of weird "app-like" folder structures or odd syntaxes that nobody else uses. It loves WordPress as it was and as it is.

github

: 1.0k

For similar tasks

NekoImageGallery

github

: 97

TalkWithGemini

Talk With Gemini is a web application that allows users to deploy their private Gemini application for free with one click. It supports Gemini Pro and Gemini Pro Vision models. The application features talk mode for direct communication with Gemini, visual recognition for understanding picture content, full Markdown support, automatic compression of chat records, privacy and security with local data storage, well-designed UI with responsive design, fast loading speed, and multi-language support. The tool is designed to be user-friendly and versatile for various deployment options and language preferences.

github

: 616

GeminiChatUp

Gemini ChatUp is a chat application utilizing the Google GeminiPro API Key. It supports responsive layout and can store multiple sets of conversations with customizable parameters for each set. Users can log in with a test account or provide their own API Key to deploy the feature. The application also offers user authentication through Edge config in Vercel, allowing users to add usernames and passwords in JSON format. Local deployment is possible by installing dependencies, setting up environment variables, and running the application locally.

github

: 88

MegaParse

MegaParse is a powerful and versatile parser designed to handle various types of documents such as text, PDFs, Powerpoint presentations, and Word documents with no information loss. It is fast, efficient, and open source, supporting a wide range of file formats. MegaParse ensures compatibility with tables, table of contents, headers, footers, and images, making it a comprehensive solution for document parsing.

github

: 5.6k

gemini_multipdf_chat

Gemini PDF Chatbot is a Streamlit-based application that allows users to chat with a conversational AI model trained on PDF documents. The chatbot extracts information from uploaded PDF files and answers user questions based on the provided context. It features PDF upload, text extraction, conversational AI using the Gemini model, and a chat interface. Users can deploy the application locally or to the cloud, and the project structure includes main application script, environment variable file, requirements, and documentation. Dependencies include PyPDF2, langchain, Streamlit, google.generativeai, and dotenv.

github

: 102

screen-pipe

Screen-pipe is a Rust + WASM tool that allows users to turn their screen into actions using Large Language Models (LLMs). It enables users to record their screen 24/7, extract text from frames, and process text and images for tasks like analyzing sales conversations. The tool is still experimental and aims to simplify the process of recording screens, extracting text, and integrating with various APIs for tasks such as filling CRM data based on screen activities. The project is open-source and welcomes contributions to enhance its functionalities and usability.

github

: 1.0k

whisper

Whisper is an open-source library by Open AI that converts/extracts text from audio. It is a cross-platform tool that supports real-time transcription of various types of audio/video without manual conversion to WAV format. The library is designed to run on Linux and Android platforms, with plans for expansion to other platforms. Whisper utilizes three frameworks to function: DART for CLI execution, Flutter for mobile app integration, and web/WASM for web application deployment. The tool aims to provide a flexible and easy-to-use solution for transcription tasks across different programs and platforms.

github

: 527

swift-ocr-llm-powered-pdf-to-markdown

Swift OCR is a powerful tool for extracting text from PDF files using OpenAI's GPT-4 Turbo with Vision model. It offers flexible input options, advanced OCR processing, performance optimizations, structured output, robust error handling, and scalable architecture. The tool ensures accurate text extraction, resilience against failures, and efficient handling of multiple requests.

github

: 219

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675

NekoImageGallery

README:

NekoImageGallery

✨ Features

📷Screenshots

✈️ Deployment

📦 Prerequisites

Hardware requirements

Software requirements

🖥️ Local Deployment

Choose a metadata storage method

Qdrant Database (Recommended)

Local File Storage

Deploy NekoImageGallery

🐋 Docker Deployment

About docker images

Prepare nvidia-container-runtime (CUDA users only)

Run the server

Upload images to NekoImageGallery

📚 API Documentation

⚡ Related Project

📊 Repository Summary

♥ Contributing

Copyright

For Tasks:

For Jobs:

Alternative AI tools for NekoImageGallery

Similar Open Source Tools

NekoImageGallery

SillyTavern

TaxHacker

axoned

AgentBench

Whisper-WebUI

OpenLLM

dify-google-cloud-terraform

vim-ollama

ai-toolkit

docetl

Upscaler

anyquery

llm-foundry

inspector-laravel

air-light

For similar tasks

NekoImageGallery

TalkWithGemini

GeminiChatUp

MegaParse

gemini_multipdf_chat

screen-pipe

whisper

swift-ocr-llm-powered-pdf-to-markdown

For similar jobs

weave

LLMStack

VisionCraft

kaito

PyRIT

tabby

spear

Magick

Prepare `nvidia-container-runtime` (CUDA users only)