stable-diffusion-prompt-reader

A simple standalone viewer for reading prompts from Stable Diffusion generated image outside the webui.

Stars: 912

Visit

A simple standalone viewer for reading prompt from Stable Diffusion generated image outside the webui. The tool supports macOS, Windows, and Linux, providing both GUI and CLI functionalities. Users can interact with the tool through drag and drop, copy prompt to clipboard, remove prompt from image, export prompt to text file, edit or import prompt to images, and more. It supports multiple formats including PNG, JPEG, WEBP, TXT, and various tools like A1111's webUI, Easy Diffusion, StableSwarmUI, Fooocus-MRE, NovelAI, InvokeAI, ComfyUI, Draw Things, and Naifu(4chan). Users can download the tool for different platforms and install it via Homebrew Cask or pip. The tool can be used to read, export, remove, and edit prompts from images, providing various modes and options for different tasks.

README:

Stable Diffusion Prompt Reader

简体中文 | English

A simple standalone viewer for reading prompt from Stable Diffusion generated image outside the webui.

Features • Supported Formats • Download • Usage • CLI • ComfyUI Node • FAQ • Credits

[!TIP] The SD Prompt Reader is now available as a ComfyUI node. Check out the ComfyUI Prompt Reader Node for more information.

Features

Support macOS, Windows and Linux.
Provides both GUI and CLI
Simple drag and drop interaction.
Copy prompt to clipboard.
Remove prompt from image.
Export prompt to text file.
Edit or import prompt to images
Vertical orientation display and sorting by alphabet
Detect generation tool.
Multiple formats support.
Dark and light mode support.

Supported Formats

	PNG	JPEG	WEBP	TXT*
A1111's webUI	✅	✅	✅	✅
Easy Diffusion	✅	✅	✅
StableSwarmUI*	✅	✅
StableSwarmUI (prior to 0.5.8-alpha)*	✅	✅
Fooocus-MRE*	✅	✅
NovelAI (stealth pnginfo)	✅		✅
NovelAI (legacy)	✅
InvokeAI	✅
InvokeAI (prior to 2.3.5-post.2)	✅
InvokeAI (prior to 1.15)	✅
ComfyUI*	✅
Draw Things	✅
Naifu(4chan)	✅

* Limitations apply. See format limitations.

[!NOTE] If you are using a tool or format that is not on this list, please help me to support your format by uploading the original file generated by your tool to the issues, thx.

[!TIP] For ComfyUI users, the SD Prompt Reader is now available as a ComfyUI node. The ComfyUI Prompt Reader Node is a subproject of this project, and it is recommended to embed the Prompt Saver node in the ComfyUI Prompt Reader Node within your workflow to ensure maximum compatibility.

Download

For Windows users

Download executable from GitHub Releases

For macOS users

Download executable from GitHub Releases

Install via Homebrew Cask

You may also install SD Prompt Reader via Homebrew cask.

brew install --no-quarantine receyuki/sd-prompt-reader/sd-prompt-reader

The parameter --no-quarantine is used since the SD Prompt Reader is currently unsigned as I mentioned here

For Linux users (not regularly tested)

~~I'm pretty sure linux users can figure things out without an executable.~~

The minimum version of Python required is 3.10
Make sure you have the tkinter package installed in your Python.
If not, install the python3-tk package with package managers.
e.g. sudo apt-get install python3-tk for Debian-based distributions

You can choose to install with pip or run it manually

Install with pip or pipx

pip install sd-prompt-reader

pipx install sd-prompt-reader

To launch app just enter sd-prompt-reader in the terminal.

Run source code manually

Clone this repo.

git clone https://github.com/receyuki/stable-diffusion-prompt-reader.git

or download repo as a zip.

CD to the directory and install dependencies.

cd stable-diffusion-prompt-reader  
pip install -r requirements.txt

Run.
```
python main.py
```

Usage

Read prompt

Open the executable file (.exe or .app) and drag and drop the image into the window.

Right click on the image and select open with SD Prompt Reader

Drag and drop the image directly onto executable (.exe or .app).

Export prompt to a text file

Click "Export" will generate a txt file alongside the image file.
To save to another location, click the expand arrow and click "select directory".

Remove prompt from image

Click "Clear" will generate a new image file with suffix "_data_removed" alongside the original image file.
To save to another location, click the expand arrow and click "select directory".
To overwrite the original image file, click the expand arrow and click "overwrite the original image".

Edit image

[!NOTE] The edited image will be written in A1111 format, meaning that image in any format will become A1111 format after editing.

Click "Edit" to enter edit mode.
Edit the prompt directly in the textbox or import a metadata file in txt format.
Click "Save" will generate a edited image file with suffix "_edited" alongside the original image file.
To save to another location, click the expand arrow and click "select directory".
To overwrite the original image file, click the expand arrow and click "overwrite the original image".

Copy as single line prompt

Copy image prompt and setting in a format that can be read by Prompts from file or textbox The following parameters are supported:

Setting	Parameter
Seed	--seed
Variation seed strength	--subseed_strength
Seed resize from	--seed_resize_from_h
Seed resize from	--seed_resize_from_w
Sampler	--sampler_name
Steps	--steps
CFG scale	--cfg_scale
Size	--width
Size	--height
Face restoration	--restore_faces

Click the expand arrow and click "single line prompt".
Paste it into the textbox below the webui script "Prompts from file or textbox".

ComfyUI SDXL workflow

[!NOTE] The SDXL workflow does not support editing. If necessary, please remove prompts from image before edit.

If the image's workflow includes multiple sets of SDXL prompts, namely Clip G(text_g), Clip L(text_l), and Refiner, the SD Prompt Reader will switch to the multi-set prompt display mode as shown in the image below. There are two interface options available for the multi-set prompt display mode, and you can switch between them using buttons.

CLI

A CLI tool for reading, modifying, and clearing metadata is provided.

Platforms

For Windows users

SD Prompt Reader CLI.exe will be placed in the zip package as a separate executable.
Examples: "SD Prompt Reader CLI.exe" -i example.png

For macOS users

The executable is located at SD Prompt Reader.app/Contents/MacOS/SD Prompt Reader.
Examples: /Applications/SD\ Prompt\ Reader.app/Contents/MacOS/SD\ Prompt\ Reader -i example.png

For pip users

Examples: sd-prompt-reader-cli -i example.png

Modes and Options

Modes

Read Mode: Activated by -r or --read flag.
Write Mode: Activated by -w or --write flag.
Clear Mode: Activated by -c or --clear flag.

General Options

-i, --input-path: Path to the input image file or directory containing image files, required parameter.
-o, --output-path: Path to the output file or directory where the processed files will be saved.
-l, --log-level: Specify the log verbosity level (e.g.DEBUG, INFO, WARN, ERROR).

Read Options

-f, --format-type: Specifies the output metadata format, choices are "TXT" or "JSON". Default format is "TXT"

Write Options

-m, --metadata: Provides a metadata file for writing.
-p, --positive: Provides a positive prompt string for writing.
-n, --negative: Provides a negative prompt string for writing.
-s, --setting: Provides a setting string for writing.

Basic Usage

If no output path is specified, the modified image will be saved in the current directory with a suffix added to the original filename.
To overwrite the source file, set the output path equal to the input path.
The write mode only supports modifications to a single image.

Read Mode

Read metadata from an image.
Usage:
sd-prompt-reader-cli [-r] -i <input_path> [--format-type <format>] [-o <output_path>]
Examples:
sd-prompt-reader-cli -i example.png
sd-prompt-reader-cli -i example.png -o metadata.txt
sd-prompt-reader-cli -r -i example.png -f TXT -o output_folder/
sd-prompt-reader-cli -r -i input_folder/ -f JSON -o output_folder/

Write Mode

Write metadata to an image.
Usage:
sd-prompt-reader-cli -w -i <input_path> -m <metadata_path> [-o <output_path>]
Examples:
sd-prompt-reader-cli -w -i example.png -m new_metadata.txt
sd-prompt-reader-cli -w -i example.png -m new_metadata.txt -o output.png
sd-prompt-reader-cli -w -i example.png -m new_metadata.json -o output_folder/

Clear Mode

Remove all metadata from an image.
Usage:
sd-prompt-reader-cli -c -i <input_path> [-o <output_path>]
Examples:
sd-prompt-reader-cli -c -i example.png
sd-prompt-reader-cli -c -i example.png -o output.png
sd-prompt-reader-cli -c -i example.png -o output_folder/
sd-prompt-reader-cli -c -i input_folder/ -o output_folder/

Format Limitations

TXT

Importing txt file is only allowed in edit mode.
Only A1111 format txt files are supported. You can use txt files generated by the A1111 webui or use the SD prompt reader to export txt from A1111 images

StableSwarmUI

[!IMPORTANT] StableSwarmUI is still in the Alpha testing phase, and its format may change in the future. I will keep track of upcoming updates of StableSwarmUI.

ComfyUI

[!IMPORTANT] When custom nodes are used or when the workflow becomes overly complex, there is a high probability that metadata may not be correctly read. This is because ComfyUI does not store metadata but only the complete workflow. SD Prompt Reader can only handle basic workflows. It is recommended to embed the Prompt Saver node in the ComfyUI Prompt Reader Node within your workflow to ensure maximum compatibility.

If there are multiple sets of data (seed, steps, CFG, etc.) in the setting box, this means that there are multiple KSampler nodes in the flowchart.
Due to the nature of ComfyUI, all nodes and flowcharts in the workflow are stored in the image, including those that are not being used. Also, a flowchart can have multiple branches, inputs and outputs. (e.g. output hires. fixed image and original image simultaneously in a single flowchart) SD Prompt Reader will traverse all flowcharts and branches and display the longest branch with complete input and output.
ComfyUI SDXL workflow

Easy Diffusion

By default, Easy Diffusion does not write metadata to images. Please change the Metadata format in settings to embed to write the metadata to images

Fooocus-MRE

Since the original version of Fooocus does not support writing metadata to image files, SD Prompt Reader only supports images generated by Fooocus MoonRide Edition.

FAQ

Malware Alert

[!WARNING] The false positive reported by some anti-malwares is caused by the packaging tool pyinstaller which is a common issue for pyinstaller users. I spent a lot of time trying to fix the Windows Defender false positive before, but I couldn't do it for every antivirus software. So, you can either trust Windows Defender or use the instruction for Linux users to use this app.

"SD Prompt Reader.app" is damaged and can't be opened. You should move it to the Trash

[!IMPORTANT] This is a very common macOS issue when you run unsigned non-appstore apps, and developers must pay $99 per year to Apple to eliminate this issue. You can choose to Allow Apps from Anywhere in security & privacy settings which can be dangerous. The way I prefer is to remove the quarantine attributes.

Open Terminal from the Applications folder.
Type in the following command and hit Enter.

xattr -r -d com.apple.quarantine /path/to/app.app

In my case it's

xattr -r -d com.apple.quarantine /Applications/SD\ Prompt\ Reader.app

If you are still concerned about the security of the app you can use the instruction for Linux users to use this app.

TODO

Batch image processing tool
Gallery/Folder view
User preference

Credits

Inspired by Stable Diffusion web UI
App icon generated using Stable Diffusion with IconsMI
Special thanks to Azusachan for providing SD server
The NovelAI stealth pnginfo parser is based on the official metadata extraction script of NovelAI

For Tasks:

Click tags to check more tools for each tasks

read prompt export prompt remove prompt edit image copy prompt

For Jobs:

data analyst software developer graphic designer ai researcher image processing specialist

Alternative AI tools for stable-diffusion-prompt-reader

Similar Open Source Tools

stable-diffusion-prompt-reader

github

: 912

TaxHacker

github

: 230

glide

Glide is a cloud-native LLM gateway that provides a unified REST API for accessing various large language models (LLMs) from different providers. It handles LLMOps tasks such as model failover, caching, key management, and more, making it easy to integrate LLMs into applications. Glide supports popular LLM providers like OpenAI, Anthropic, Azure OpenAI, AWS Bedrock (Titan), Cohere, Google Gemini, OctoML, and Ollama. It offers high availability, performance, and observability, and provides SDKs for Python and NodeJS to simplify integration.

github

: 110

vision-parse

Vision Parse is a tool that leverages Vision Language Models to parse PDF documents into beautifully formatted markdown content. It offers smart content extraction, content formatting, multi-LLM support, PDF document support, and local model hosting using Ollama. Users can easily convert PDFs to markdown with high precision and preserve document hierarchy and styling. The tool supports multiple Vision LLM providers like OpenAI, LLama, and Gemini for accuracy and speed, making document processing efficient and effortless.

github

: 222

NekoImageGallery

NekoImageGallery is an online AI image search engine that utilizes the Clip model and Qdrant vector database. It supports keyword search and similar image search. The tool generates 768-dimensional vectors for each image using the Clip model, supports OCR text search using PaddleOCR, and efficiently searches vectors using the Qdrant vector database. Users can deploy the tool locally or via Docker, with options for metadata storage using Qdrant database or local file storage. The tool provides API documentation through FastAPI's built-in Swagger UI and can be used for tasks like image search, text extraction, and vector search.

github

: 97

we0

We0 is a web project generation tool that offers browser-based debugging, high-fidelity design restoration, importing historical projects, integration with WeChat Mini Program Developer Tools, and multi-platform support. It supports code generation, design-to-code conversion, open-source projects, WeChat Mini Program Tools preview, existing projects, and Deepseek. The tool uses pnpm as the package management tool and requires Node.js version 18.20. Users can install and configure the tool for web development and utilize quick start methods for building the web editor. Additionally, instructions are provided for installing and using the client version on Mac, along with troubleshooting tips. For any questions or support, users can contact [email protected] or join the WeChat group chat.

github

: 391

evalverse

Evalverse is an open-source project designed to support Large Language Model (LLM) evaluation needs. It provides a standardized and user-friendly solution for processing and managing LLM evaluations, catering to AI research engineers and scientists. Evalverse supports various evaluation methods, insightful reports, and no-code evaluation processes. Users can access unified evaluation with submodules, request evaluations without code via Slack bot, and obtain comprehensive reports with scores, rankings, and visuals. The tool allows for easy comparison of scores across different models and swift addition of new evaluation tools.

github

: 159

AirConnect-Synology

AirConnect-Synology is a minimal Synology package that allows users to use AirPlay to stream to UPnP/Sonos & Chromecast devices that do not natively support AirPlay. It is compatible with DSM 7.0 and DSM 7.1, and provides detailed information on installation, configuration, supported devices, troubleshooting, and more. The package automates the installation and usage of AirConnect on Synology devices, ensuring compatibility with various architectures and firmware versions. Users can customize the configuration using the airconnect.conf file and adjust settings for specific speakers like Sonos, Bose SoundTouch, and Pioneer/Phorus/Play-Fi.

github

: 303

rag-chatbot

The RAG ChatBot project combines Lama.cpp, Chroma, and Streamlit to build a Conversation-aware Chatbot and a Retrieval-augmented generation (RAG) ChatBot. The RAG Chatbot works by taking a collection of Markdown files as input and provides answers based on the context provided by those files. It utilizes a Memory Builder component to load Markdown pages, divide them into sections, calculate embeddings, and save them in an embedding database. The chatbot retrieves relevant sections from the database, rewrites questions for optimal retrieval, and generates answers using a local language model. It also remembers previous interactions for more accurate responses. Various strategies are implemented to deal with context overflows, including creating and refining context, hierarchical summarization, and async hierarchical summarization.

github

: 194

runpod-worker-comfy

runpod-worker-comfy is a serverless API tool that allows users to run any ComfyUI workflow to generate an image. Users can provide input images as base64-encoded strings, and the generated image can be returned as a base64-encoded string or uploaded to AWS S3. The tool is built on Ubuntu + NVIDIA CUDA and provides features like built-in checkpoints and VAE models. Users can configure environment variables to upload images to AWS S3 and interact with the RunPod API to generate images. The tool also supports local testing and deployment to Docker hub using Github Actions.

github

: 412

DiffusionToolkit

Diffusion Toolkit is an image metadata-indexer and viewer for AI-generated images. It helps you organize, search, and sort your ever-growing collection. Key features include: - Scanning images and storing prompts and other metadata (PNGInfo) - Searching for images using simple queries or filters - Viewing images and metadata easily - Tagging images with favorites, ratings, and NSFW flags - Sorting images by date created, aesthetic score, or rating - Auto-tagging NSFW images by keywords - Blurring images tagged as NSFW - Creating and managing albums - Viewing and searching prompts - Drag-and-drop functionality Diffusion Toolkit supports various image formats, including JPG/JPEG, PNG, WebP, and TXT metadata. It also supports metadata formats from popular AI image generators like AUTOMATIC1111, InvokeAI, NovelAI, Stable Diffusion, and more. You can use Diffusion Toolkit even on images without metadata and still enjoy features like rating and album management.

github

: 799

tts-generation-webui

TTS Generation WebUI is a comprehensive tool that provides a user-friendly interface for text-to-speech and voice cloning tasks. It integrates various AI models such as Bark, MusicGen, AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and MAGNeT. The tool offers one-click installers, Google Colab demo, videos for guidance, and extra voices for Bark. Users can generate audio outputs, manage models, caches, and system space for AI projects. The project is open-source and emphasizes ethical and responsible use of AI technology.

github

: 1.6k

airflint

Airflint is a tool designed to enforce best practices for all your Airflow Directed Acyclic Graphs (DAGs). It is currently in the alpha stage and aims to help users adhere to recommended practices when working with Airflow. Users can install Airflint from PyPI and integrate it into their existing Airflow environment to improve DAG quality. The tool provides rules for function-level imports and jinja template syntax usage, among others, to enhance the development process of Airflow DAGs.

github

: 88

rag

RAG with txtai is a Retrieval Augmented Generation (RAG) Streamlit application that helps generate factually correct content by limiting the context in which a Large Language Model (LLM) can generate answers. It supports two categories of RAG: Vector RAG, where context is supplied via a vector search query, and Graph RAG, where context is supplied via a graph path traversal query. The application allows users to run queries, add data to the index, and configure various parameters to control its behavior.

github

: 349

llama-zip

llama-zip is a command-line utility for lossless text compression and decompression. It leverages a user-provided large language model (LLM) as the probabilistic model for an arithmetic coder, achieving high compression ratios for structured or natural language text. The tool is not limited by the LLM's maximum context length and can handle arbitrarily long input text. However, the speed of compression and decompression is limited by the LLM's inference speed.

github

: 158

Qwen

Qwen is a series of large language models developed by Alibaba DAMO Academy. It outperforms the baseline models of similar model sizes on a series of benchmark datasets, e.g., MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, etc., which evaluate the models’ capabilities on natural language understanding, mathematic problem solving, coding, etc. Qwen models outperform the baseline models of similar model sizes on a series of benchmark datasets, e.g., MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, etc., which evaluate the models’ capabilities on natural language understanding, mathematic problem solving, coding, etc. Qwen-72B achieves better performance than LLaMA2-70B on all tasks and outperforms GPT-3.5 on 7 out of 10 tasks.

github

: 17.0k

For similar tasks

stable-diffusion-prompt-reader

github

: 912

Awesome-Segment-Anything

The Segment Anything Model (SAM) is a powerful tool that allows users to segment any object in an image with just a few clicks. This makes it a great tool for a variety of tasks, such as object detection, tracking, and editing. SAM is also very easy to use, making it a great option for both beginners and experienced users.

github

: 321

InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) based on InternLM2-7B excelling in free-form text-image composition and comprehension. It boasts several amazing capabilities and applications: * **Free-form Interleaved Text-Image Composition** : InternLM-XComposer2 can effortlessly generate coherent and contextual articles with interleaved images following diverse inputs like outlines, detailed text requirements and reference images, enabling highly customizable content creation. * **Accurate Vision-language Problem-solving** : InternLM-XComposer2 accurately handles diverse and challenging vision-language Q&A tasks based on free-form instructions, excelling in recognition, perception, detailed captioning, visual reasoning, and more. * **Awesome performance** : InternLM-XComposer2 based on InternLM2-7B not only significantly outperforms existing open-source multimodal models in 13 benchmarks but also **matches or even surpasses GPT-4V and Gemini Pro in 6 benchmarks** We release InternLM-XComposer2 series in three versions: * **InternLM-XComposer2-4KHD-7B** 🤗: The high-resolution multi-task trained VLLM model with InternLM-7B as the initialization of the LLM for _High-resolution understanding_ , _VL benchmarks_ and _AI assistant_. * **InternLM-XComposer2-VL-7B** 🤗 : The multi-task trained VLLM model with InternLM-7B as the initialization of the LLM for _VL benchmarks_ and _AI assistant_. **It ranks as the most powerful vision-language model based on 7B-parameter level LLMs, leading across 13 benchmarks.** * **InternLM-XComposer2-VL-1.8B** 🤗 : A lightweight version of InternLM-XComposer2-VL based on InternLM-1.8B. * **InternLM-XComposer2-7B** 🤗: The further instruction tuned VLLM for _Interleaved Text-Image Composition_ with free-form inputs. Please refer to Technical Report and 4KHD Technical Reportfor more details.

github

: 2.7k

InternGPT

InternGPT (iGPT) is a pointing-language-driven visual interactive system that enhances communication between users and chatbots by incorporating pointing instructions. It improves chatbot accuracy in vision-centric tasks, especially in complex visual scenarios. The system includes an auxiliary control mechanism to enhance the control capability of the language model. InternGPT features a large vision-language model called Husky, fine-tuned for high-quality multi-modal dialogue. Users can interact with ChatGPT by clicking, dragging, and drawing using a pointing device, leading to efficient communication and improved chatbot performance in vision-related tasks.

github

: 3.2k

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675