horde-worker-reGen
The default client software to create images for the AI-Horde
Stars: 85
This repository provides the latest implementation for the AI Horde Worker, allowing users to utilize their graphics card(s) to generate, post-process, or analyze images for others. It offers a platform where users can create images and earn 'kudos' in return, granting priority for their own image generations. The repository includes important details for setup, recommendations for system configurations, instructions for installation on Windows and Linux, basic usage guidelines, and information on updating the AI Horde Worker. Users can also run the worker with multiple GPUs and receive notifications for updates through Discord. Additionally, the repository contains models that are licensed under the CreativeML OpenRAIL License.
README:
This repository allows you to set up a AI Horde Worker to generate, post-process or analyze images for others
If you want the latest information or have questions, come to the #local-workers channel in discord
This repo contains the latest implementation for the AI Horde Worker. This will turn your graphics card(s) into a worker for the AI Horde where you will create images for others. You you will receive in turn earn 'kudos' which will give you priority for your own generations.
-
An SSD is strongly recommended especially if you are offering more than one model.
- If you only have an HDD available to you, you can only offer one model and will have to be able to load 3-8gb off disk within 60 seconds or the worker will not function.
- Do not set threads higher than 2 unless you have a data-center grade card (48gb+ VRAM)
- Your memory usage will increase up until the number of queued jobs (
queue_size
in the config).- If you have less than 32gb of system ram, you should should stick to
queue_size: 1
. - If you have less than 16gb of system ram or you experience frequent memory-related crashes:
- Do not offer SDXL/SD21 models. You can do this by adding
ALL SDXL
andALL SD21
to yourmodels_to_skip
if you are using theTOP N
model load option to automatically remove these heavier models from your offerings. - Set
allow_post_processing
andallow_controlnet
to false - Set
queue_size: 0
- Do not offer SDXL/SD21 models. You can do this by adding
- If you have less than 32gb of system ram, you should should stick to
- If you plan on running SDXL, you will need to ensure at least 9 gb of system ram remains free while the worker is running.
- If you have an 8 gb card, SDXL will only reliably work at max_power values close to 32. 42 was too high for tests on a 2080 in certain cases.
Please note that AMD cards are not currently well supported, but may be in the future.
Update: AMD now has been shown to have better support but for linux machines only - linux must be installed on the bare metal machine; windows systems, WSL or linux containers still do not work. You can now follow this guide using
horde-bridge-rocm.sh
andupdate-runtime-rocm.sh
where appropriate.
If you are willing to try with your AMD card, join the discord discussion.
Please see the prior section before proceeding.
If you haven't already, go to AI Horde and register an account, then store your API key somewhere secure. Treat your API key like a password. You will need it later in these instructions. This will allow your worker to gather kudos for your account.
Use these instructions if you have installed git for windows.
This option is recommended as it will make keeping your repository up to date much easier.
- Open
powershell
(also referred to as terminal) orcmd
from the start menu. - Using
cd
, navigate to a folder that you want the worker installed in.- Note that the folder you are in will create a folder named
horde-worker-reGen
. This folder should not exist before you run the following commands. - If you want it to be installed in
C:\horde\
, run the following:cd C:\horde
- (if the
horde
folder doesn't exist)cd C:\ mkdir horde cd C:\horde
- If you are using
cmd
and wish to install on a different drive, include the/d
option, as so:cd /d G:\horde
- Note that the folder you are in will create a folder named
- Run the following commands within the folder chosen (the folder
horde
if using the example above)git clone https://github.com/Haidra-Org/horde-worker-reGen.git cd horde-worker-reGen
- Continue with the Basic Usage instructions
Use these instructions if you do not have git for windows and do not want to install it. These instructions make updating the worker a bit more difficult down the line.
- Download the zipped version
- Extract it to any folder of your choice
- Continue with the Basic Usage instructions
This assumes you have git installed
Open a bash terminal and run these commands (just copy-paste them all together)
git clone https://github.com/Haidra-Org/horde-worker-reGen.git
cd horde-worker-reGen
Continue with the Basic Usage instructions
The below instructions refers to horde-bridge
or update-runtime
. Depending on your OS, append .cmd
for windows, or .sh
for linux
- for example,
horde-bridge.cmd
andupdate-runtime.cmd
for windows
Note: If you have an AMD card you should use
horde-bridge-rocm.sh
andupdate-runtime-rocm.sh
where appropriate
You can double click the provided script files below from a file explorer or run it from a terminal like bash
, cmd
depending on your OS. The latter option will allow you to see errors in case of a crash, so it's recommended.
- Make a copy of
bridgeData_template.yaml
tobridgeData.yaml
- Edit
bridgeData.yaml
and follow the instructions within to fill in your details.
Models are loaded as needed and just-in-time. You can offer as many models as you want provided you have an SSD, at least 32gb of ram, and at least 8gb of VRAM (see Important Info. Workers with HDDs are not recommended at this time but those with HDDs should run exactly 1 model. A typical SD1.5 model is around 2gb each, while a typical SDXL model is around 7gb each. Offering all
models is currently around 700gb total and we commit to keeping that number below 1TB with any future changes.
Note: We suggest you disable any 'sleep' or reduced power modes for your system while the worker is running.
-
If you have a 24gb+ vram card:
- safety_on_gpu: true - high_memory_mode: true - high_performance_mode: true - post_process_job_overlap: true - unload_models_from_vram_often: false - max_threads: 1 # If you have Flux/Cascade loaded, otherwise 2 max - queue_size: 2 # You can set to 3 if you have 64GB or more of RAM - max_batch: 8 # or higher
-
If you have a 12gb - 16gb card:
- safety_on_gpu: true # Consider setting to `false` if offering Cascade or Flux - high_memory_mode: true - moderate_performance_mode: true - unload_models_from_vram_often: false - max_threads: 1 - max_batch: 4 # or higher
-
If you have an 8gb-10gb vram card:
-
- queue_size: 1 # max **or** only offer flux - safety_on_gpu: false - max_threads: 1 - max_power: 32 # no higher than 32 - max_batch: 4 # no higher than 4 - allow_post_processing: false # If offering SDXL or Flux, otherwise you may set to true - allow_sdxl_controlnet: false
- Be sure to shut every single VRAM consuming application you can and do not use the computer with the worker running for any purpose.
-
-
Workers which have low end cards or have low performance for other reasons:
-
- extra_slow_worker: true
- gives you considerably more time to finish job, but requests will not go to your worker unless the requester opts-in (even anon users do not use extra_slow_workers by default). You should only consider using this if you have historically had less than 0.3 MPS/S or less than 3000 kudos/hr consistently and you are sure the worker is otherwise configured correctly.
-
- limit_max_steps: true
- reduces the maximum total number of steps in a single job you will receive based on the model baseline.
-
- preload_timeout: 120
- gives you more time to load models off disk. Note: Abusing this value can lead to a major loss of kudos and may also lead to maintainance mode, even with
extra_slow_worker: true
.
- gives you more time to load models off disk. Note: Abusing this value can lead to a major loss of kudos and may also lead to maintainance mode, even with
-
-
The first time you install, or when updates are required, see Updating for instructions.
-
Depending on the type of worker:
- 'Dreamer' worker (image generation): run
horde-bridge
.- Warning: This requires a powerful GPU. You will need a GPU with at least 6GB VRAM and 16GB+ of system RAM.
- 'Alchemy' worker (upscaling, interrogation, etc) is not current supported and will come in a future version of reGen.
- 'Dreamer' worker (image generation): run
- In the terminal in which it's running, press
Ctrl+C
together. - The worker will finish the current jobs before exiting.
In the future you will not need to run multiple worker instances
To use multiple GPUs each has to start their own instance. For linux, you just need to limit the run to a specific card:
CUDA_VISIBLE_DEVICES=0 ./horde-bridge.sh -n "My awesome instance #1"
CUDA_VISIBLE_DEVICES=1 ./horde-bridge.sh -n "My awesome instance #2"
etc
Be warned that you will need a very high (32-64gb+) amount of system ram depending on your settings. queue_size
and max_threads
increases the amount of ram required per worker substantially.
The AI Horde workers are under constant improvement. You can follow progress in our discord and get notifications about updates there. If you are interested in receiving notifications for worker updates or betas, go to the #get-roles channel and get the appropriate role(s).
To update:
-
Shut down your worker by pressing ctrl+c once and waiting for the worker to stop.
-
Update this repo using the appropriate method:
Use this approach if you cloned the original repository using
git clone
- Open a or
bash
,cmd
, orpowershell
terminal depending on your OS - Navigate to the folder you have the AI Horde Worker repository installed if you're not already there.
- run
git pull
Use this approach if you downloaded the git repository as a zip file and extracted it somewhere.
- delete the
horde_worker_regen/
directory from your folder - Download the repository from github as a zip file
- Extract its contents into the same the folder you have the AI Horde Worker repository installed, overwriting any existing files
- Open a or
-
Run the
update-runtime
script for your OS. This will update all dependencies if required.- Some updates may not require this and the update notification will tell you if this is the case.
- When in doubt, you should run it anyway.
- Advanced users: If you do not want to use mamba or you are comfortable with python/venvs, see README_advanced.md.
-
Continue with Starting/stopping instructions above
You can host your own image models on the horde which are not available in our model reference, but this process is a bit more complex.
To start with, you need to manually request the customizer
role from then horde team. You can ask for it in the discord channel. This is a manually assigned role to prevent abuse of this feature.
Once you have the customizer role, you need to download the model files you want to host. Place them in any location on your system.
Finally, you need to point your worker to their location and provide some information about them. On your bridgeData.yaml simply add lines like the following
custom_models:
- name: Movable figure model XL
baseline: stable_diffusion_xl
filepath: /home/db0/projects/CUSTOM_MODELS/PVCStyleModelMovable_beta25Realistic.safetensors
And then add the same "name" to your models_to_load.
If everything was setup correctly, you should now see a custom_models.json
in your worker directory after the worker starts, and the model should be offered by your worker.
Note that:
- You cannot serve custom models with the same name as any of our regular models
- The horde doesn't know your model, so it will treat it as a SD 1.5 model for kudos rewards and cannot warn people using the wrong parameters such as clip_skip
See README_advanced.md.
Many models in this project use the CreativeML OpenRAIL License. Please read the full license here.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for horde-worker-reGen
Similar Open Source Tools
horde-worker-reGen
This repository provides the latest implementation for the AI Horde Worker, allowing users to utilize their graphics card(s) to generate, post-process, or analyze images for others. It offers a platform where users can create images and earn 'kudos' in return, granting priority for their own image generations. The repository includes important details for setup, recommendations for system configurations, instructions for installation on Windows and Linux, basic usage guidelines, and information on updating the AI Horde Worker. Users can also run the worker with multiple GPUs and receive notifications for updates through Discord. Additionally, the repository contains models that are licensed under the CreativeML OpenRAIL License.
openui
OpenUI is a tool designed to simplify the process of building UI components by allowing users to describe UI using their imagination and see it rendered live. It supports converting HTML to React, Svelte, Web Components, etc. The tool is open source and aims to make UI development fun, fast, and flexible. It integrates with various AI services like OpenAI, Groq, Gemini, Anthropic, Cohere, and Mistral, providing users with the flexibility to use different models. OpenUI also supports LiteLLM for connecting to various LLM services and allows users to create custom proxy configs. The tool can be run locally using Docker or Python, and it offers a development environment for quick setup and testing.
ai-voice-cloning
This repository provides a tool for AI voice cloning, allowing users to generate synthetic speech that closely resembles a target speaker's voice. The tool is designed to be user-friendly and accessible, with a graphical user interface that guides users through the process of training a voice model and generating synthetic speech. The tool also includes a variety of features that allow users to customize the generated speech, such as the pitch, volume, and speaking rate. Overall, this tool is a valuable resource for anyone interested in creating realistic and engaging synthetic speech.
gpt-pilot
GPT Pilot is a core technology for the Pythagora VS Code extension, aiming to provide the first real AI developer companion. It goes beyond autocomplete, helping with writing full features, debugging, issue discussions, and reviews. The tool utilizes LLMs to generate production-ready apps, with developers overseeing the implementation. GPT Pilot works step by step like a developer, debugging issues as they arise. It can work at any scale, filtering out code to show only relevant parts to the AI during tasks. Contributions are welcome, with debugging and telemetry being key areas of focus for improvement.
webwhiz
WebWhiz is an open-source tool that allows users to train ChatGPT on website data to build AI chatbots for customer queries. It offers easy integration, data-specific responses, regular data updates, no-code builder, chatbot customization, fine-tuning, and offline messaging. Users can create and train chatbots in a few simple steps by entering their website URL, automatically fetching and preparing training data, training ChatGPT, and embedding the chatbot on their website. WebWhiz can crawl websites monthly, collect text data and metadata, and process text data using tokens. Users can train custom data, but bringing custom open AI keys is not yet supported. The tool has no limitations on context size but may limit the number of pages based on the chosen plan. WebWhiz SDK is available on NPM, CDNs, and GitHub, and users can self-host it using Docker or manual setup involving MongoDB, Redis, Node, Python, and environment variables setup. For any issues, users can contact [email protected].
ai-toolkit
The AI Toolkit by Ostris is a collection of tools for machine learning, specifically designed for image generation, LoRA (latent representations of attributes) extraction and manipulation, and model training. It provides a user-friendly interface and extensive documentation to make it accessible to both developers and non-developers. The toolkit is actively under development, with new features and improvements being added regularly. Some of the key features of the AI Toolkit include: - Batch Image Generation: Allows users to generate a batch of images based on prompts or text files, using a configuration file to specify the desired settings. - LoRA (lierla), LoCON (LyCORIS) Extractor: Facilitates the extraction of LoRA and LoCON representations from pre-trained models, enabling users to modify and manipulate these representations for various purposes. - LoRA Rescale: Provides a tool to rescale LoRA weights, allowing users to adjust the influence of specific attributes in the generated images. - LoRA Slider Trainer: Enables the training of LoRA sliders, which can be used to control and adjust specific attributes in the generated images, offering a powerful tool for fine-tuning and customization. - Extensions: Supports the creation and sharing of custom extensions, allowing users to extend the functionality of the toolkit with their own tools and scripts. - VAE (Variational Auto Encoder) Trainer: Facilitates the training of VAEs for image generation, providing users with a tool to explore and improve the quality of generated images. The AI Toolkit is a valuable resource for anyone interested in exploring and utilizing machine learning for image generation and manipulation. Its user-friendly interface, extensive documentation, and active development make it an accessible and powerful tool for both beginners and experienced users.
SeaGOAT
SeaGOAT is a local search tool that leverages vector embeddings to enable you to search your codebase semantically. It is designed to work on Linux, macOS, and Windows and can process files in various formats, including text, Markdown, Python, C, C++, TypeScript, JavaScript, HTML, Go, Java, PHP, and Ruby. SeaGOAT uses a vector database called ChromaDB and a local vector embedding engine to provide fast and accurate search results. It also supports regular expression/keyword-based matches. SeaGOAT is open-source and licensed under an open-source license, and users are welcome to examine the source code, raise concerns, or create pull requests to fix problems.
azure-search-openai-demo
This sample demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure OpenAI Service to access a GPT model (gpt-35-turbo), and Azure AI Search for data indexing and retrieval. The repo includes sample data so it's ready to try end to end. In this sample application we use a fictitious company called Contoso Electronics, and the experience allows its employees to ask questions about the benefits, internal policies, as well as job descriptions and roles.
airbyte_serverless
AirbyteServerless is a lightweight tool designed to simplify the management of Airbyte connectors. It offers a serverless mode for running connectors, allowing users to easily move data from any source to their data warehouse. Unlike the full Airbyte-Open-Source-Platform, AirbyteServerless focuses solely on the Extract-Load process without a UI, database, or transform layer. It provides a CLI tool, 'abs', for managing connectors, creating connections, running jobs, selecting specific data streams, handling secrets securely, and scheduling remote runs. The tool is scalable, allowing independent deployment of multiple connectors. It aims to streamline the connector management process and provide a more agile alternative to the comprehensive Airbyte platform.
CLI
Bito CLI provides a command line interface to the Bito AI chat functionality, allowing users to interact with the AI through commands. It supports complex automation and workflows, with features like long prompts and slash commands. Users can install Bito CLI on Mac, Linux, and Windows systems using various methods. The tool also offers configuration options for AI model type, access key management, and output language customization. Bito CLI is designed to enhance user experience in querying AI models and automating tasks through the command line interface.
AlwaysReddy
AlwaysReddy is a simple LLM assistant with no UI that you interact with entirely using hotkeys. It can easily read from or write to your clipboard, and voice chat with you via TTS and STT. Here are some of the things you can use AlwaysReddy for: - Explain a new concept to AlwaysReddy and have it save the concept (in roughly your words) into a note. - Ask AlwaysReddy "What is X called?" when you know how to roughly describe something but can't remember what it is called. - Have AlwaysReddy proofread the text in your clipboard before you send it. - Ask AlwaysReddy "From the comments in my clipboard, what do the r/LocalLLaMA users think of X?" - Quickly list what you have done today and get AlwaysReddy to write a journal entry to your clipboard before you shutdown the computer for the day.
ai-clone-whatsapp
This repository provides a tool to create an AI chatbot clone of yourself using your WhatsApp chats as training data. It utilizes the Torchtune library for finetuning and inference. The code includes preprocessing of WhatsApp chats, finetuning models, and chatting with the AI clone via a command-line interface. Supported models are Llama3-8B-Instruct and Mistral-7B-Instruct-v0.2. Hardware requirements include approximately 16 GB vRAM for QLoRa Llama3 finetuning with a 4k context length. The repository addresses common issues like adjusting parameters for training and preprocessing non-English chats.
ai-town
AI Town is a virtual town where AI characters live, chat, and socialize. This project provides a deployable starter kit for building and customizing your own version of AI Town. It features a game engine, database, vector search, auth, text model, deployment, pixel art generation, background music generation, and local inference. You can customize your own simulation by creating characters and stories, updating spritesheets, changing the background, and modifying the background music.
cog-comfyui
Cog-comfyui allows users to run ComfyUI workflows on Replicate. ComfyUI is a visual programming tool for creating and sharing generative art workflows. With cog-comfyui, users can access a variety of pre-trained models and custom nodes to create their own unique artworks. The tool is easy to use and does not require any coding experience. Users simply need to upload their API JSON file and any necessary input files, and then click the "Run" button. Cog-comfyui will then generate the output image or video file.
ollama-ebook-summary
The 'ollama-ebook-summary' repository is a Python project that creates bulleted notes summaries of books and long texts, particularly in epub and pdf formats with ToC metadata. It automates the extraction of chapters, splits them into ~2000 token chunks, and allows for asking arbitrary questions to parts of the text for improved granularity of response. The tool aims to provide summaries for each page of a book rather than a one-page summary of the entire document, enhancing content curation and knowledge sharing capabilities.
ultravox
Ultravox is a fast multimodal Language Model (LLM) that can understand both text and human speech in real-time without the need for a separate Audio Speech Recognition (ASR) stage. By extending Meta's Llama 3 model with a multimodal projector, Ultravox converts audio directly into a high-dimensional space used by Llama 3, enabling quick responses and potential understanding of paralinguistic cues like timing and emotion in human speech. The current version (v0.3) has impressive speed metrics and aims for further enhancements. Ultravox currently converts audio to streaming text and plans to emit speech tokens for direct audio conversion. The tool is open for collaboration to enhance this functionality.
For similar tasks
HPT
Hyper-Pretrained Transformers (HPT) is a novel multimodal LLM framework from HyperGAI, trained for vision-language models capable of understanding both textual and visual inputs. The repository contains the open-source implementation of inference code to reproduce the evaluation results of HPT Air on different benchmarks. HPT has achieved competitive results with state-of-the-art models on various multimodal LLM benchmarks. It offers models like HPT 1.5 Air and HPT 1.0 Air, providing efficient solutions for vision-and-language tasks.
learnopencv
LearnOpenCV is a repository containing code for Computer Vision, Deep learning, and AI research articles shared on the blog LearnOpenCV.com. It serves as a resource for individuals looking to enhance their expertise in AI through various courses offered by OpenCV. The repository includes a wide range of topics such as image inpainting, instance segmentation, robotics, deep learning models, and more, providing practical implementations and code examples for readers to explore and learn from.
spark-free-api
Spark AI Free 服务 provides high-speed streaming output, multi-turn dialogue support, AI drawing support, long document interpretation, and image parsing. It offers zero-configuration deployment, multi-token support, and automatic session trace cleaning. It is fully compatible with the ChatGPT interface. The repository includes multiple free-api projects for various AI services. Users can access the API for tasks such as chat completions, AI drawing, document interpretation, image analysis, and ssoSessionId live checking. The project also provides guidelines for deployment using Docker, Docker-compose, Render, Vercel, and native deployment methods. It recommends using custom clients for faster and simpler access to the free-api series projects.
mlx-vlm
MLX-VLM is a package designed for running Vision LLMs on Mac systems using MLX. It provides a convenient way to install and utilize the package for processing large language models related to vision tasks. The tool simplifies the process of running LLMs on Mac computers, offering a seamless experience for users interested in leveraging MLX for vision-related projects.
clarifai-python-grpc
This is the official Clarifai gRPC Python client for interacting with their recognition API. Clarifai offers a platform for data scientists, developers, researchers, and enterprises to utilize artificial intelligence for image, video, and text analysis through computer vision and natural language processing. The client allows users to authenticate, predict concepts in images, and access various functionalities provided by the Clarifai API. It follows a versioning scheme that aligns with the backend API updates and includes specific instructions for installation and troubleshooting. Users can explore the Clarifai demo, sign up for an account, and refer to the documentation for detailed information.
horde-worker-reGen
This repository provides the latest implementation for the AI Horde Worker, allowing users to utilize their graphics card(s) to generate, post-process, or analyze images for others. It offers a platform where users can create images and earn 'kudos' in return, granting priority for their own image generations. The repository includes important details for setup, recommendations for system configurations, instructions for installation on Windows and Linux, basic usage guidelines, and information on updating the AI Horde Worker. Users can also run the worker with multiple GPUs and receive notifications for updates through Discord. Additionally, the repository contains models that are licensed under the CreativeML OpenRAIL License.
geospy
Geospy is a Python tool that utilizes Graylark's AI-powered geolocation service to determine the location where photos were taken. It allows users to analyze images and retrieve information such as country, city, explanation, coordinates, and Google Maps links. The tool provides a seamless way to integrate geolocation services into various projects and applications.
Awesome-Colorful-LLM
Awesome-Colorful-LLM is a meticulously assembled anthology of vibrant multimodal research focusing on advancements propelled by large language models (LLMs) in domains such as Vision, Audio, Agent, Robotics, and Fundamental Sciences like Mathematics. The repository contains curated collections of works, datasets, benchmarks, projects, and tools related to LLMs and multimodal learning. It serves as a comprehensive resource for researchers and practitioners interested in exploring the intersection of language models and various modalities for tasks like image understanding, video pretraining, 3D modeling, document understanding, audio analysis, agent learning, robotic applications, and mathematical research.
For similar jobs
horde-worker-reGen
This repository provides the latest implementation for the AI Horde Worker, allowing users to utilize their graphics card(s) to generate, post-process, or analyze images for others. It offers a platform where users can create images and earn 'kudos' in return, granting priority for their own image generations. The repository includes important details for setup, recommendations for system configurations, instructions for installation on Windows and Linux, basic usage guidelines, and information on updating the AI Horde Worker. Users can also run the worker with multiple GPUs and receive notifications for updates through Discord. Additionally, the repository contains models that are licensed under the CreativeML OpenRAIL License.
lollms-webui
LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.
NightshadeAntidote
Nightshade Antidote is an image forensics tool used to analyze digital images for signs of manipulation or forgery. It implements several common techniques used in image forensics including metadata analysis, copy-move forgery detection, frequency domain analysis, and JPEG compression artifacts analysis. The tool takes an input image, performs analysis using the above techniques, and outputs a report summarizing the findings.
MaterialSearch
MaterialSearch is a tool for searching local images and videos using natural language. It provides functionalities such as text search for images, image search for images, text search for videos (providing matching video clips), image search for videos (searching for the segment in a video through a screenshot), image-text similarity calculation, and Pexels video search. The tool can be deployed through the source code or Docker image, and it supports GPU acceleration. Users can configure the tool through environment variables or a .env file. The tool is still under development, and configurations may change frequently. Users can report issues or suggest improvements through issues or pull requests.
CLIPPyX
CLIPPyX is a powerful system-wide image search and management tool that offers versatile search options to find images based on their content, text, and visual similarity. With advanced features, users can effortlessly locate desired images across their entire computer's disk(s), regardless of their location or file names. The tool utilizes OpenAI's CLIP for image embeddings and text-based search, along with OCR for extracting text from images. It also employs Voidtools Everything SDK to list paths of all images on the system. CLIPPyX server receives search queries and queries collections of image embeddings and text embeddings to return relevant images.
aicsimageio
AICSImageIO is a Python tool for Image Reading, Metadata Conversion, and Image Writing for Microscopy Images. It supports various file formats like OME-TIFF, TIFF, ND2, DV, CZI, LIF, PNG, GIF, and Bio-Formats. Users can read and write metadata and imaging data, work with different file systems like local paths, HTTP URLs, s3fs, and gcsfs. The tool provides functionalities for full image reading, delayed image reading, mosaic image reading, metadata reading, xarray coordinate plane attachment, cloud IO support, and saving to OME-TIFF. It also offers benchmarking and developer resources.
DeepDanbooru
DeepDanbooru is an anime-style girl image tag estimation system written in Python. It allows users to estimate images using a live demo site. The tool requires specific packages to be installed and provides a structured dataset for training projects. Users can create training projects, download tags, filter datasets, and start training to estimate tags for images. The tool uses a specific dataset structure and project structure to facilitate the training process.
wenxin-starter
WenXin-Starter is a spring-boot-starter for Baidu's "Wenxin Qianfan WENXINWORKSHOP" large model, which can help you quickly access Baidu's AI capabilities. It fully integrates the official API documentation of Wenxin Qianfan. Supports text-to-image generation, built-in dialogue memory, and supports streaming return of dialogue. Supports QPS control of a single model and supports queuing mechanism. Plugins will be added soon.