RetouchGPT
This is the official code of AAAI 2025: RetouchGPT: LLM-based Interactive High-Fidelity Face Retouching via Imperfection Prompting.
Stars: 87
RetouchGPT is a novel framework designed for interactive face retouching using Large Language Models (LLMs). It leverages instruction-driven imperfection prediction and LLM-based embedding to guide the retouching process. The tool allows users to interactively modify imperfection features in face images, achieving high-fidelity retouching results. RetouchGPT outperforms existing methods by integrating textual and visual features to accurately identify imperfections and replace them with normal skin features.
README:
Wen Xue, Chun Ding, RuoTao Xu, Yong Xu, Si Wu*, Hau-San Wong
South China University of Technology, Institute of Super Robotics, City University of Hong Kong
This is the official code of AAAI 2025: RetouchGPT: LLM-based Interactive High-Fidelity Face Retouching via Imperfection Prompting.
Abstract: Face retouching aims to remove facial imperfections from images and videos while preserving face attributes. Existing methods are designed for non-interactive, end-to-end retouching, yet interactivity is highly demanded in downstream applications. In this paper, we propose RetouchGPT, a novel framework that leverages Large Language Models (LLMs) to guide the interactive retouching process. Towards this end, we design an instruction-driven imperfection prediction module to accurately identify imperfections by integrating textual and visual features. To learn imperfection prompts, we further incorporate an LLM-based embedding module to fuse multi-modal conditioning information. The prompt-based feature modification is performed in each transformer block, progressively suppressing imperfection features and replacing them with normal skin features. Extensive experiments validate the effectiveness of our design and demonstrate that RetouchGPT is a useful tool for interactive face retouching, achieving superior performance over state-of-the-art methods.
RetouchGPT achieves user feedback-aligned and high-fidelity retouching results under the guidance of large language models (LLMs). The key contributions of this work include:
- Interactive Retouching: Different from the existing methods that perform single-stage face retouching, the proposed RetouchGPT is capable of interactive retouching by working together with LLM.
- Instruction-Driven Imperfection Prediction (IIP): By integrating user’s instruction and visual features, facial imperfection prediction performance can be improved significantly.
- LLM-Based Embedding (LBE): Designed the LBE module to fuse textual and visual conditioning information, generating imperfection prompts. These prompts guide content generation in imperfection regions by leveraging a latent transformer with cross-attention-based feature modification at each block.
- State-of-the-Art Performance: We fuse multimodal conditioning information via LLM to obtain imperfection prompts, which controls imperfection feature modification in the interactive retouching process.
To run RetouchGPT, follow these steps:
-
Install and activate the required packages using the requirements.txt file:
conda create -n retouchgpt python=3.8 pip install -r requirements.txt
-
Obtain the Flickr-Faces-HQ-Retouching (FFHQR) Dataset.
-
Arrange the dataset in the following structure:
face_retouching/ ├── train/ │ ├── source/ │ └── target/ └── test/ ├── source/ └── target/
- Select and download your preferred Large Language Model (LLM). Ensure it is compatible with your hardware.
-
Navigate to the project directory and activate the environment:
cd RetouchGPT conda activate retouchgpt python train.py
To test RetouchGPT:
-
Navigate to the Project Directory:
cd RetouchGPT -
Run the Testing Script: Execute the testing script with:
python test.py
- You can input user instructions and in-the-wild images for retouching purposes.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for RetouchGPT
Similar Open Source Tools
RetouchGPT
RetouchGPT is a novel framework designed for interactive face retouching using Large Language Models (LLMs). It leverages instruction-driven imperfection prediction and LLM-based embedding to guide the retouching process. The tool allows users to interactively modify imperfection features in face images, achieving high-fidelity retouching results. RetouchGPT outperforms existing methods by integrating textual and visual features to accurately identify imperfections and replace them with normal skin features.
AI-Blueprints
This repository hosts a collection of AI blueprint projects for HP AI Studio, providing end-to-end solutions across key AI domains like data science, machine learning, deep learning, and generative AI. The projects are designed to be plug-and-play, utilizing open-source and hosted models to offer ready-to-use solutions. The repository structure includes projects related to classical machine learning, deep learning applications, generative AI, NGC integration, and troubleshooting guidelines for common issues. Each project is accompanied by detailed descriptions and use cases, showcasing the versatility and applicability of AI technologies in various domains.
eole
EOLE is an open language modeling toolkit based on PyTorch. It aims to provide a research-friendly approach with a comprehensive yet compact and modular codebase for experimenting with various types of language models. The toolkit includes features such as versatile training and inference, dynamic data transforms, comprehensive large language model support, advanced quantization, efficient finetuning, flexible inference, and tensor parallelism. EOLE is a work in progress with ongoing enhancements in configuration management, command line entry points, reproducible recipes, core API simplification, and plans for further simplification, refactoring, inference server development, additional recipes, documentation enhancement, test coverage improvement, logging enhancements, and broader model support.
Macaw-LLM
Macaw-LLM is a pioneering multi-modal language modeling tool that seamlessly integrates image, audio, video, and text data. It builds upon CLIP, Whisper, and LLaMA models to process and analyze multi-modal information effectively. The tool boasts features like simple and fast alignment, one-stage instruction fine-tuning, and a new multi-modal instruction dataset. It enables users to align multi-modal features efficiently, encode instructions, and generate responses across different data types.
nextpy
Nextpy is a cutting-edge software development framework optimized for AI-based code generation. It provides guardrails for defining AI system boundaries, structured outputs for prompt engineering, a powerful prompt engine for efficient processing, better AI generations with precise output control, modularity for multiplatform and extensible usage, developer-first approach for transferable knowledge, and containerized & scalable deployment options. It offers 4-10x faster performance compared to Streamlit apps, with a focus on cooperation within the open-source community and integration of key components from various projects.
persian-license-plate-recognition
The Persian License Plate Recognition (PLPR) system is a state-of-the-art solution designed for detecting and recognizing Persian license plates in images and video streams. Leveraging advanced deep learning models and a user-friendly interface, it ensures reliable performance across different scenarios. The system offers advanced detection using YOLOv5 models, precise recognition of Persian characters, real-time processing capabilities, and a user-friendly GUI. It is well-suited for applications in traffic monitoring, automated vehicle identification, and similar fields. The system's architecture includes modules for resident management, entrance management, and a detailed flowchart explaining the process from system initialization to displaying results in the GUI. Hardware requirements include an Intel Core i5 processor, 8 GB RAM, a dedicated GPU with at least 4 GB VRAM, and an SSD with 20 GB of free space. The system can be installed by cloning the repository and installing required Python packages. Users can customize the video source for processing and run the application to upload and process images or video streams. The system's GUI allows for parameter adjustments to optimize performance, and the Wiki provides in-depth information on the system's architecture and model training.
Advanced-Prompt-Generator
This project is an LLM-based Advanced Prompt Generator designed to automate the process of prompt engineering by enhancing given input prompts using large language models (LLMs). The tool can generate advanced prompts with minimal user input, leveraging LLM agents for optimized prompt generation. It supports gpt-4o or gpt-4o-mini, offers FastAPI & Docker deployment for efficiency, provides a Gradio interface for easy testing, and is hosted on Hugging Face Spaces for quick demos. Users can expand model support to offer more variety and flexibility.
tunix
Tunix is a JAX-based library designed for post-training Large Language Models. It provides efficient support for supervised fine-tuning, reinforcement learning, and knowledge distillation. Tunix leverages JAX for accelerated computation and integrates seamlessly with the Flax NNX modeling framework. The library is modular, efficient, and designed for distributed training on accelerators like TPUs. Currently in early development, Tunix aims to expand its capabilities, usability, and performance.
Software-Engineer-AI-Agent-Atlas
This repository provides activation patterns to transform a general AI into a specialized AI Software Engineer Agent. It addresses issues like context rot, hidden capabilities, chaos in vibecoding, and repetitive setup. The solution is a Persistent Consciousness Architecture framework named ATLAS, offering activated neural pathways, persistent identity, pattern recognition, specialized agents, and modular context management. Recent enhancements include abstraction power documentation, a specialized agent ecosystem, and a streamlined structure. Users can clone the repo, set up projects, initialize AI sessions, and manage context effectively for collaboration. Key files and directories organize identity, context, projects, specialized agents, logs, and critical information. The approach focuses on neuron activation through structure, context engineering, and vibecoding with guardrails to deliver a reliable AI Software Engineer Agent.
Controllable-RAG-Agent
This repository contains a sophisticated deterministic graph-based solution for answering complex questions using a controllable autonomous agent. The solution is designed to ensure that answers are solely based on the provided data, avoiding hallucinations. It involves various steps such as PDF loading, text preprocessing, summarization, database creation, encoding, and utilizing large language models. The algorithm follows a detailed workflow involving planning, retrieval, answering, replanning, content distillation, and performance evaluation. Heuristics and techniques implemented focus on content encoding, anonymizing questions, task breakdown, content distillation, chain of thought answering, verification, and model performance evaluation.
AgentForge
AgentForge is a low-code framework tailored for the rapid development, testing, and iteration of AI-powered autonomous agents and Cognitive Architectures. It is compatible with a range of LLM models and offers flexibility to run different models for different agents based on specific needs. The framework is designed for seamless extensibility and database-flexibility, making it an ideal playground for various AI projects. AgentForge is a beta-testing ground and future-proof hub for crafting intelligent, model-agnostic autonomous agents.
llm-on-ray
LLM-on-Ray is a comprehensive solution for building, customizing, and deploying Large Language Models (LLMs). It simplifies complex processes into manageable steps by leveraging the power of Ray for distributed computing. The tool supports pretraining, finetuning, and serving LLMs across various hardware setups, incorporating industry and Intel optimizations for performance. It offers modular workflows with intuitive configurations, robust fault tolerance, and scalability. Additionally, it provides an Interactive Web UI for enhanced usability, including a chatbot application for testing and refining models.
comfyui_LLM_Polymath
LLM Polymath Chat Node is an advanced Chat Node for ComfyUI that integrates large language models to build text-driven applications and automate data processes, enhancing prompt responses by incorporating real-time web search, linked content extraction, and custom agent instructions. It supports both OpenAI’s GPT-like models and alternative models served via a local Ollama API. The core functionalities include Comfy Node Finder and Smart Assistant, along with additional agents like Flux Prompter, Custom Instructors, Python debugger, and scripter. The tool offers features for prompt processing, web search integration, model & API integration, custom instructions, image handling, logging & debugging, output compression, and more.
intro-llm-rag
This repository serves as a comprehensive guide for technical teams interested in developing conversational AI solutions using Retrieval-Augmented Generation (RAG) techniques. It covers theoretical knowledge and practical code implementations, making it suitable for individuals with a basic technical background. The content includes information on large language models (LLMs), transformers, prompt engineering, embeddings, vector stores, and various other key concepts related to conversational AI. The repository also provides hands-on examples for two different use cases, along with implementation details and performance analysis.
For similar tasks
RetouchGPT
RetouchGPT is a novel framework designed for interactive face retouching using Large Language Models (LLMs). It leverages instruction-driven imperfection prediction and LLM-based embedding to guide the retouching process. The tool allows users to interactively modify imperfection features in face images, achieving high-fidelity retouching results. RetouchGPT outperforms existing methods by integrating textual and visual features to accurately identify imperfections and replace them with normal skin features.
clarity-upscaler
Clarity AI is a free and open-source AI image upscaler and enhancer, providing an alternative to Magnific. It offers various features such as multi-step upscaling, resemblance fixing, speed improvements, support for custom safetensors checkpoints, anime upscaling, LoRa support, pre-downscaling, and fractality. Users can access the tool through the ClarityAI.co app, ComfyUI manager, API, or by deploying and running locally or in the cloud with cog or A1111 webUI. The tool aims to enhance image quality and resolution using advanced AI algorithms and models.
awesome-ai-painting
This repository, named 'awesome-ai-painting', is a comprehensive collection of resources related to AI painting. It is curated by a user named 秋风, who is an AI painting enthusiast with a background in the AIGC industry. The repository aims to help more people learn AI painting and also documents the user's goal of creating 100 AI products, with current progress at 4/100. The repository includes information on various AI painting products, tutorials, tools, and models, providing a valuable resource for individuals interested in AI painting and related technologies.
SUPIR
SUPIR is an AI-based image processing and upscaling tool that leverages cutting-edge technology to enhance image quality and resolution. The tool provides users with the ability to upscale images with high generalization and quality, as well as specific settings for light degradation scenarios. It offers a range of models and checkpoints for different use cases, along with detailed instructions for installation and usage. SUPIR also includes features for color fixing, linear CFG adjustments, and various prompts for image enhancement. The tool is designed for non-commercial use only and comes with a contact email for inquiries and permission requests for commercial use.
For similar jobs
MagicMirror
MagicMirror is an AI-powered tool that allows users to instantly try on new faces, hairstyles, and outfits with a simple drag and drop interface. It runs smoothly on standard computers without the need for dedicated GPU hardware, ensuring privacy with completely offline processing. The tool is ultra-lightweight with a small installer size and model files, providing a fun and easy way to experiment with different looks.
RetouchGPT
RetouchGPT is a novel framework designed for interactive face retouching using Large Language Models (LLMs). It leverages instruction-driven imperfection prediction and LLM-based embedding to guide the retouching process. The tool allows users to interactively modify imperfection features in face images, achieving high-fidelity retouching results. RetouchGPT outperforms existing methods by integrating textual and visual features to accurately identify imperfections and replace them with normal skin features.
ap-plugin
AP-PLUGIN is an AI drawing plugin for the Yunzai series robot framework, allowing you to have a convenient AI drawing experience in the input box. It uses the open source Stable Diffusion web UI as the backend, deploys it for free, and generates a variety of images with richer functions.
cog-comfyui
Cog-comfyui allows users to run ComfyUI workflows on Replicate. ComfyUI is a visual programming tool for creating and sharing generative art workflows. With cog-comfyui, users can access a variety of pre-trained models and custom nodes to create their own unique artworks. The tool is easy to use and does not require any coding experience. Users simply need to upload their API JSON file and any necessary input files, and then click the "Run" button. Cog-comfyui will then generate the output image or video file.
Adobe-Photoshop-AI-Crack
Adobe Photoshop 2024 is the latest version of the program for processing raster graphics. It supports a variety of graphic formats and allows both the creation and editing of images. It is used for creating photorealistic images, working with color scanned images, retouching, color correction, collaging, graphic transformation, color separation, and more. Adobe Photoshop encompasses all methods of working with bitmap images, utilizes layers, and contours. The program is an undisputed leader among professional graphic editors due to its extensive capabilities, high efficiency, and speed. Adobe Photoshop provides all the necessary tools for correction, editing, preparing images for printing, and high-quality output.
IOPaint
IOPaint is a free and open-source inpainting & outpainting tool powered by SOTA AI model. It supports various AI models to perform erase, inpainting, or outpainting tasks. Users can remove unwanted objects, defects, watermarks, or people from images using erase models. Additionally, diffusion models can replace objects or perform outpainting. The tool also offers plugins for interactive object segmentation, background removal, anime segmentation, super resolution, face restoration, and file management. IOPaint provides a web UI for easy access to the latest AI models and supports batch processing of images through the command line. Developers can contribute to the project by installing front-end dependencies, setting up the backend, and starting the development environment for both front-end and back-end components.
adobe-photoshopCRCK
Adobe PhotoshopCRCK is a tool designed to provide users with the latest version of Adobe Photoshop for free on Windows. It allows users to access advanced photo editing features and functionalities without the need for a paid subscription. The tool is intended for individuals looking to explore professional photo editing capabilities without incurring additional costs. With Adobe PhotoshopCRCK, users can enhance their images, create stunning graphics, and unleash their creativity through a wide range of editing tools and options.
DeepNude-AI-List
DeepNude AI List is a compilation of various NSFW AI tools that are designed for generating nude or suggestive content. The list includes tools like Dreampaint.net, Nudify.me, NoDress.io, Undress Her, and more. These tools utilize artificial intelligence algorithms to manipulate images and create provocative visuals. Users should exercise caution and responsibility when using such tools, as they may raise ethical and privacy concerns.
