stable-diffusion-prompt-reader
A simple standalone viewer for reading prompts from Stable Diffusion generated image outside the webui.
Stars: 912
A simple standalone viewer for reading prompt from Stable Diffusion generated image outside the webui. The tool supports macOS, Windows, and Linux, providing both GUI and CLI functionalities. Users can interact with the tool through drag and drop, copy prompt to clipboard, remove prompt from image, export prompt to text file, edit or import prompt to images, and more. It supports multiple formats including PNG, JPEG, WEBP, TXT, and various tools like A1111's webUI, Easy Diffusion, StableSwarmUI, Fooocus-MRE, NovelAI, InvokeAI, ComfyUI, Draw Things, and Naifu(4chan). Users can download the tool for different platforms and install it via Homebrew Cask or pip. The tool can be used to read, export, remove, and edit prompts from images, providing various modes and options for different tasks.
README:
A simple standalone viewer for reading prompt from Stable Diffusion generated image outside the webui.
Features • Supported Formats • Download • Usage • CLI • ComfyUI Node • FAQ • Credits
[!TIP] The SD Prompt Reader is now available as a ComfyUI node. Check out the ComfyUI Prompt Reader Node for more information.
- Support macOS, Windows and Linux.
- Provides both GUI and CLI
- Simple drag and drop interaction.
- Copy prompt to clipboard.
- Remove prompt from image.
- Export prompt to text file.
- Edit or import prompt to images
- Vertical orientation display and sorting by alphabet
- Detect generation tool.
- Multiple formats support.
- Dark and light mode support.
PNG | JPEG | WEBP | TXT* | |
---|---|---|---|---|
A1111's webUI | ✅ | ✅ | ✅ | ✅ |
Easy Diffusion | ✅ | ✅ | ✅ | |
StableSwarmUI* | ✅ | ✅ | ||
StableSwarmUI (prior to 0.5.8-alpha)* | ✅ | ✅ | ||
Fooocus-MRE* | ✅ | ✅ | ||
NovelAI (stealth pnginfo) | ✅ | ✅ | ||
NovelAI (legacy) | ✅ | |||
InvokeAI | ✅ | |||
InvokeAI (prior to 2.3.5-post.2) | ✅ | |||
InvokeAI (prior to 1.15) | ✅ | |||
ComfyUI* | ✅ | |||
Draw Things | ✅ | |||
Naifu(4chan) | ✅ |
* Limitations apply. See format limitations.
[!NOTE] If you are using a tool or format that is not on this list, please help me to support your format by uploading the original file generated by your tool to the issues, thx.
[!TIP] For ComfyUI users, the SD Prompt Reader is now available as a ComfyUI node. The ComfyUI Prompt Reader Node is a subproject of this project, and it is recommended to embed the Prompt Saver node in the ComfyUI Prompt Reader Node within your workflow to ensure maximum compatibility.
Download executable from GitHub Releases
Download executable from GitHub Releases
You may also install SD Prompt Reader via Homebrew cask.
brew install --no-quarantine receyuki/sd-prompt-reader/sd-prompt-reader
The parameter --no-quarantine
is used since the SD Prompt Reader is currently unsigned as I mentioned here
I'm pretty sure linux users can figure things out without an executable.
- The minimum version of Python required is 3.10
- Make sure you have the tkinter package installed in your Python.
If not, install the python3-tk package with package managers.
e.g.sudo apt-get install python3-tk
for Debian-based distributions
You can choose to install with pip or run it manually
pip install sd-prompt-reader
or
pipx install sd-prompt-reader
To launch app just enter sd-prompt-reader
in the terminal.
- Clone this repo.
or download repo as a zip.
git clone https://github.com/receyuki/stable-diffusion-prompt-reader.git
- CD to the directory and install dependencies.
cd stable-diffusion-prompt-reader pip install -r requirements.txt
- Run.
python main.py
- Open the executable file (.exe or .app) and drag and drop the image into the window.
OR
- Right click on the image and select open with SD Prompt Reader
OR
- Drag and drop the image directly onto executable (.exe or .app).
- Click "Export" will generate a txt file alongside the image file.
- To save to another location, click the expand arrow and click "select directory".
- Click "Clear" will generate a new image file with suffix "_data_removed" alongside the original image file.
- To save to another location, click the expand arrow and click "select directory".
- To overwrite the original image file, click the expand arrow and click "overwrite the original image".
[!NOTE] The edited image will be written in A1111 format, meaning that image in any format will become A1111 format after editing.
- Click "Edit" to enter edit mode.
- Edit the prompt directly in the textbox or import a metadata file in txt format.
- Click "Save" will generate a edited image file with suffix "_edited" alongside the original image file.
- To save to another location, click the expand arrow and click "select directory".
- To overwrite the original image file, click the expand arrow and click "overwrite the original image".
Copy image prompt and setting in a format that can be read by Prompts from file or textbox The following parameters are supported:
Setting | Parameter |
---|---|
Seed | --seed |
Variation seed strength | --subseed_strength |
Seed resize from | --seed_resize_from_h |
Seed resize from | --seed_resize_from_w |
Sampler | --sampler_name |
Steps | --steps |
CFG scale | --cfg_scale |
Size | --width |
Size | --height |
Face restoration | --restore_faces |
- Click the expand arrow and click "single line prompt".
- Paste it into the textbox below the webui script "Prompts from file or textbox".
[!NOTE] The SDXL workflow does not support editing. If necessary, please remove prompts from image before edit.
If the image's workflow includes multiple sets of SDXL prompts,
namely Clip G(text_g), Clip L(text_l), and Refiner, the SD Prompt Reader will switch to the multi-set prompt display mode as shown in the image below.
There are two interface options available for the multi-set prompt display mode, and you can switch between them using buttons.
A CLI tool for reading, modifying, and clearing metadata is provided.
SD Prompt Reader CLI.exe
will be placed in the zip package as a separate executable.
Examples:
"SD Prompt Reader CLI.exe" -i example.png
The executable is located at SD Prompt Reader.app/Contents/MacOS/SD Prompt Reader
.
Examples:
/Applications/SD\ Prompt\ Reader.app/Contents/MacOS/SD\ Prompt\ Reader -i example.png
Examples:
sd-prompt-reader-cli -i example.png
- Read Mode: Activated by
-r
or--read
flag. - Write Mode: Activated by
-w
or--write
flag. - Clear Mode: Activated by
-c
or--clear
flag.
-
-i
,--input-path
: Path to the input image file or directory containing image files, required parameter. -
-o
,--output-path
: Path to the output file or directory where the processed files will be saved. -
-l
,--log-level
: Specify the log verbosity level (e.g.DEBUG, INFO, WARN, ERROR).
-
-f
,--format-type
: Specifies the output metadata format, choices are "TXT" or "JSON". Default format is "TXT"
-
-m
,--metadata
: Provides a metadata file for writing. -
-p
,--positive
: Provides a positive prompt string for writing. -
-n
,--negative
: Provides a negative prompt string for writing. -
-s
,--setting
: Provides a setting string for writing.
- If no output path is specified, the modified image will be saved in the current directory with a suffix added to the original filename.
- To overwrite the source file, set the output path equal to the input path.
- The write mode only supports modifications to a single image.
- Read metadata from an image.
- Usage:
sd-prompt-reader-cli [-r] -i <input_path> [--format-type <format>] [-o <output_path>]
- Examples:
sd-prompt-reader-cli -i example.png
sd-prompt-reader-cli -i example.png -o metadata.txt
sd-prompt-reader-cli -r -i example.png -f TXT -o output_folder/
sd-prompt-reader-cli -r -i input_folder/ -f JSON -o output_folder/
- Write metadata to an image.
- Usage:
sd-prompt-reader-cli -w -i <input_path> -m <metadata_path> [-o <output_path>]
- Examples:
sd-prompt-reader-cli -w -i example.png -m new_metadata.txt
sd-prompt-reader-cli -w -i example.png -m new_metadata.txt -o output.png
sd-prompt-reader-cli -w -i example.png -m new_metadata.json -o output_folder/
- Remove all metadata from an image.
- Usage:
sd-prompt-reader-cli -c -i <input_path> [-o <output_path>]
- Examples:
sd-prompt-reader-cli -c -i example.png
sd-prompt-reader-cli -c -i example.png -o output.png
sd-prompt-reader-cli -c -i example.png -o output_folder/
sd-prompt-reader-cli -c -i input_folder/ -o output_folder/
- Importing txt file is only allowed in edit mode.
- Only A1111 format txt files are supported. You can use txt files generated by the A1111 webui or use the SD prompt reader to export txt from A1111 images
[!IMPORTANT] StableSwarmUI is still in the Alpha testing phase, and its format may change in the future. I will keep track of upcoming updates of StableSwarmUI.
[!IMPORTANT] When custom nodes are used or when the workflow becomes overly complex, there is a high probability that metadata may not be correctly read. This is because ComfyUI does not store metadata but only the complete workflow. SD Prompt Reader can only handle basic workflows. It is recommended to embed the Prompt Saver node in the ComfyUI Prompt Reader Node within your workflow to ensure maximum compatibility.
- If there are multiple sets of data (seed, steps, CFG, etc.) in the setting box, this means that there are multiple KSampler nodes in the flowchart.
- Due to the nature of ComfyUI, all nodes and flowcharts in the workflow are stored in the image, including those that are not being used. Also, a flowchart can have multiple branches, inputs and outputs. (e.g. output hires. fixed image and original image simultaneously in a single flowchart) SD Prompt Reader will traverse all flowcharts and branches and display the longest branch with complete input and output.
- ComfyUI SDXL workflow
By default, Easy Diffusion does not write metadata to images. Please change the Metadata format in settings to embed to write the metadata to images
Since the original version of Fooocus does not support writing metadata to image files, SD Prompt Reader only supports images generated by Fooocus MoonRide Edition.
[!WARNING] The false positive reported by some anti-malwares is caused by the packaging tool pyinstaller which is a common issue for pyinstaller users. I spent a lot of time trying to fix the Windows Defender false positive before, but I couldn't do it for every antivirus software. So, you can either trust Windows Defender or use the instruction for Linux users to use this app.
[!IMPORTANT] This is a very common macOS issue when you run unsigned non-appstore apps, and developers must pay $99 per year to Apple to eliminate this issue. You can choose to Allow Apps from Anywhere in security & privacy settings which can be dangerous. The way I prefer is to remove the quarantine attributes.
-
Open Terminal from the Applications folder.
-
Type in the following command and hit Enter.
xattr -r -d com.apple.quarantine /path/to/app.app
In my case it's
xattr -r -d com.apple.quarantine /Applications/SD\ Prompt\ Reader.app
If you are still concerned about the security of the app you can use the instruction for Linux users to use this app.
- Batch image processing tool
- Gallery/Folder view
- User preference
- Inspired by Stable Diffusion web UI
- App icon generated using Stable Diffusion with IconsMI
- Special thanks to Azusachan for providing SD server
- The NovelAI stealth pnginfo parser is based on the official metadata extraction script of NovelAI
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for stable-diffusion-prompt-reader
Similar Open Source Tools
stable-diffusion-prompt-reader
A simple standalone viewer for reading prompt from Stable Diffusion generated image outside the webui. The tool supports macOS, Windows, and Linux, providing both GUI and CLI functionalities. Users can interact with the tool through drag and drop, copy prompt to clipboard, remove prompt from image, export prompt to text file, edit or import prompt to images, and more. It supports multiple formats including PNG, JPEG, WEBP, TXT, and various tools like A1111's webUI, Easy Diffusion, StableSwarmUI, Fooocus-MRE, NovelAI, InvokeAI, ComfyUI, Draw Things, and Naifu(4chan). Users can download the tool for different platforms and install it via Homebrew Cask or pip. The tool can be used to read, export, remove, and edit prompts from images, providing various modes and options for different tasks.
camel
CAMEL is an open-source library designed for the study of autonomous and communicative agents. We believe that studying these agents on a large scale offers valuable insights into their behaviors, capabilities, and potential risks. To facilitate research in this field, we implement and support various types of agents, tasks, prompts, models, and simulated environments.
mLoRA
mLoRA (Multi-LoRA Fine-Tune) is an open-source framework for efficient fine-tuning of multiple Large Language Models (LLMs) using LoRA and its variants. It allows concurrent fine-tuning of multiple LoRA adapters with a shared base model, efficient pipeline parallelism algorithm, support for various LoRA variant algorithms, and reinforcement learning preference alignment algorithms. mLoRA helps save computational and memory resources when training multiple adapters simultaneously, achieving high performance on consumer hardware.
glide
Glide is a cloud-native LLM gateway that provides a unified REST API for accessing various large language models (LLMs) from different providers. It handles LLMOps tasks such as model failover, caching, key management, and more, making it easy to integrate LLMs into applications. Glide supports popular LLM providers like OpenAI, Anthropic, Azure OpenAI, AWS Bedrock (Titan), Cohere, Google Gemini, OctoML, and Ollama. It offers high availability, performance, and observability, and provides SDKs for Python and NodeJS to simplify integration.
xFasterTransformer
xFasterTransformer is an optimized solution for Large Language Models (LLMs) on the X86 platform, providing high performance and scalability for inference on mainstream LLM models. It offers C++ and Python APIs for easy integration, along with example codes and benchmark scripts. Users can prepare models in a different format, convert them, and use the APIs for tasks like encoding input prompts, generating token ids, and serving inference requests. The tool supports various data types and models, and can run in single or multi-rank modes using MPI. A web demo based on Gradio is available for popular LLM models like ChatGLM and Llama2. Benchmark scripts help evaluate model inference performance quickly, and MLServer enables serving with REST and gRPC interfaces.
evalverse
Evalverse is an open-source project designed to support Large Language Model (LLM) evaluation needs. It provides a standardized and user-friendly solution for processing and managing LLM evaluations, catering to AI research engineers and scientists. Evalverse supports various evaluation methods, insightful reports, and no-code evaluation processes. Users can access unified evaluation with submodules, request evaluations without code via Slack bot, and obtain comprehensive reports with scores, rankings, and visuals. The tool allows for easy comparison of scores across different models and swift addition of new evaluation tools.
Whisper-WebUI
Whisper-WebUI is a Gradio-based browser interface for Whisper, serving as an Easy Subtitle Generator. It supports generating subtitles from various sources such as files, YouTube, and microphone. The tool also offers speech-to-text and text-to-text translation features, utilizing Facebook NLLB models and DeepL API. Users can translate subtitle files from other languages to English and vice versa. The project integrates faster-whisper for improved VRAM usage and transcription speed, providing efficiency metrics for optimized whisper models. Additionally, users can choose from different Whisper models based on size and language requirements.
AirConnect-Synology
AirConnect-Synology is a minimal Synology package that allows users to use AirPlay to stream to UPnP/Sonos & Chromecast devices that do not natively support AirPlay. It is compatible with DSM 7.0 and DSM 7.1, and provides detailed information on installation, configuration, supported devices, troubleshooting, and more. The package automates the installation and usage of AirConnect on Synology devices, ensuring compatibility with various architectures and firmware versions. Users can customize the configuration using the airconnect.conf file and adjust settings for specific speakers like Sonos, Bose SoundTouch, and Pioneer/Phorus/Play-Fi.
manga-image-translator
Translate texts in manga/images. Some manga/images will never be translated, therefore this project is born. * Image/Manga Translator * Samples * Online Demo * Disclaimer * Installation * Pip/venv * Poetry * Additional instructions for **Windows** * Docker * Hosting the web server * Using as CLI * Setting Translation Secrets * Using with Nvidia GPU * Building locally * Usage * Batch mode (default) * Demo mode * Web Mode * Api Mode * Related Projects * Docs * Recommended Modules * Tips to improve translation quality * Options * Language Code Reference * Translators Reference * GPT Config Reference * Using Gimp for rendering * Api Documentation * Synchronous mode * Asynchronous mode * Manual translation * Next steps * Support Us * Thanks To All Our Contributors :
MockingBird
MockingBird is a toolbox designed for Mandarin speech synthesis using PyTorch. It supports multiple datasets such as aidatatang_200zh, magicdata, aishell3, and data_aishell. The toolbox can run on Windows, Linux, and M1 MacOS, providing easy and effective speech synthesis with pretrained encoder/vocoder models. It is webserver ready for remote calling. Users can train their own models or use existing ones for the encoder, synthesizer, and vocoder. The toolbox offers a demo video and detailed setup instructions for installation and model training.
DiffusionToolkit
Diffusion Toolkit is an image metadata-indexer and viewer for AI-generated images. It helps you organize, search, and sort your ever-growing collection. Key features include: - Scanning images and storing prompts and other metadata (PNGInfo) - Searching for images using simple queries or filters - Viewing images and metadata easily - Tagging images with favorites, ratings, and NSFW flags - Sorting images by date created, aesthetic score, or rating - Auto-tagging NSFW images by keywords - Blurring images tagged as NSFW - Creating and managing albums - Viewing and searching prompts - Drag-and-drop functionality Diffusion Toolkit supports various image formats, including JPG/JPEG, PNG, WebP, and TXT metadata. It also supports metadata formats from popular AI image generators like AUTOMATIC1111, InvokeAI, NovelAI, Stable Diffusion, and more. You can use Diffusion Toolkit even on images without metadata and still enjoy features like rating and album management.
text-embeddings-inference
Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for popular models like FlagEmbedding, Ember, GTE, and E5. It implements features such as no model graph compilation step, Metal support for local execution on Macs, small docker images with fast boot times, token-based dynamic batching, optimized transformers code for inference using Flash Attention, Candle, and cuBLASLt, Safetensors weight loading, and production-ready features like distributed tracing with Open Telemetry and Prometheus metrics.
DaoCloud-docs
DaoCloud Enterprise 5.0 Documentation provides detailed information on using DaoCloud, a Certified Kubernetes Service Provider. The documentation covers current and legacy versions, workflow control using GitOps, and instructions for opening a PR and previewing changes locally. It also includes naming conventions, writing tips, references, and acknowledgments to contributors. Users can find guidelines on writing, contributing, and translating pages, along with using tools like MkDocs, Docker, and Poetry for managing the documentation.
tts-generation-webui
TTS Generation WebUI is a comprehensive tool that provides a user-friendly interface for text-to-speech and voice cloning tasks. It integrates various AI models such as Bark, MusicGen, AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and MAGNeT. The tool offers one-click installers, Google Colab demo, videos for guidance, and extra voices for Bark. Users can generate audio outputs, manage models, caches, and system space for AI projects. The project is open-source and emphasizes ethical and responsible use of AI technology.
promptfoo
Promptfoo is a tool for testing and evaluating LLM output quality. With promptfoo, you can build reliable prompts, models, and RAGs with benchmarks specific to your use-case, speed up evaluations with caching, concurrency, and live reloading, score outputs automatically by defining metrics, use as a CLI, library, or in CI/CD, and use OpenAI, Anthropic, Azure, Google, HuggingFace, open-source models like Llama, or integrate custom API providers for any LLM API.
airflint
Airflint is a tool designed to enforce best practices for all your Airflow Directed Acyclic Graphs (DAGs). It is currently in the alpha stage and aims to help users adhere to recommended practices when working with Airflow. Users can install Airflint from PyPI and integrate it into their existing Airflow environment to improve DAG quality. The tool provides rules for function-level imports and jinja template syntax usage, among others, to enhance the development process of Airflow DAGs.
For similar tasks
stable-diffusion-prompt-reader
A simple standalone viewer for reading prompt from Stable Diffusion generated image outside the webui. The tool supports macOS, Windows, and Linux, providing both GUI and CLI functionalities. Users can interact with the tool through drag and drop, copy prompt to clipboard, remove prompt from image, export prompt to text file, edit or import prompt to images, and more. It supports multiple formats including PNG, JPEG, WEBP, TXT, and various tools like A1111's webUI, Easy Diffusion, StableSwarmUI, Fooocus-MRE, NovelAI, InvokeAI, ComfyUI, Draw Things, and Naifu(4chan). Users can download the tool for different platforms and install it via Homebrew Cask or pip. The tool can be used to read, export, remove, and edit prompts from images, providing various modes and options for different tasks.
Awesome-Segment-Anything
The Segment Anything Model (SAM) is a powerful tool that allows users to segment any object in an image with just a few clicks. This makes it a great tool for a variety of tasks, such as object detection, tracking, and editing. SAM is also very easy to use, making it a great option for both beginners and experienced users.
InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) based on InternLM2-7B excelling in free-form text-image composition and comprehension. It boasts several amazing capabilities and applications: * **Free-form Interleaved Text-Image Composition** : InternLM-XComposer2 can effortlessly generate coherent and contextual articles with interleaved images following diverse inputs like outlines, detailed text requirements and reference images, enabling highly customizable content creation. * **Accurate Vision-language Problem-solving** : InternLM-XComposer2 accurately handles diverse and challenging vision-language Q&A tasks based on free-form instructions, excelling in recognition, perception, detailed captioning, visual reasoning, and more. * **Awesome performance** : InternLM-XComposer2 based on InternLM2-7B not only significantly outperforms existing open-source multimodal models in 13 benchmarks but also **matches or even surpasses GPT-4V and Gemini Pro in 6 benchmarks** We release InternLM-XComposer2 series in three versions: * **InternLM-XComposer2-4KHD-7B** 🤗: The high-resolution multi-task trained VLLM model with InternLM-7B as the initialization of the LLM for _High-resolution understanding_ , _VL benchmarks_ and _AI assistant_. * **InternLM-XComposer2-VL-7B** 🤗 : The multi-task trained VLLM model with InternLM-7B as the initialization of the LLM for _VL benchmarks_ and _AI assistant_. **It ranks as the most powerful vision-language model based on 7B-parameter level LLMs, leading across 13 benchmarks.** * **InternLM-XComposer2-VL-1.8B** 🤗 : A lightweight version of InternLM-XComposer2-VL based on InternLM-1.8B. * **InternLM-XComposer2-7B** 🤗: The further instruction tuned VLLM for _Interleaved Text-Image Composition_ with free-form inputs. Please refer to Technical Report and 4KHD Technical Reportfor more details.
InternGPT
InternGPT (iGPT) is a pointing-language-driven visual interactive system that enhances communication between users and chatbots by incorporating pointing instructions. It improves chatbot accuracy in vision-centric tasks, especially in complex visual scenarios. The system includes an auxiliary control mechanism to enhance the control capability of the language model. InternGPT features a large vision-language model called Husky, fine-tuned for high-quality multi-modal dialogue. Users can interact with ChatGPT by clicking, dragging, and drawing using a pointing device, leading to efficient communication and improved chatbot performance in vision-related tasks.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.