data:image/s3,"s3://crabby-images/74c83/74c83df2ebf176f02fdd6a78b77f5efae33d2d47" alt="RetouchGPT"
RetouchGPT
This is the official code of AAAI 2025: RetouchGPT: LLM-based Interactive High-Fidelity Face Retouching via Imperfection Prompting.
Stars: 87
data:image/s3,"s3://crabby-images/4cd1c/4cd1cb1410b65654a3c63434adbf92770402e69d" alt="screenshot"
RetouchGPT is a novel framework designed for interactive face retouching using Large Language Models (LLMs). It leverages instruction-driven imperfection prediction and LLM-based embedding to guide the retouching process. The tool allows users to interactively modify imperfection features in face images, achieving high-fidelity retouching results. RetouchGPT outperforms existing methods by integrating textual and visual features to accurately identify imperfections and replace them with normal skin features.
README:
Wen Xue, Chun Ding, RuoTao Xu, Yong Xu, Si Wu*, Hau-San Wong
South China University of Technology, Institute of Super Robotics, City University of Hong Kong
This is the official code of AAAI 2025: RetouchGPT: LLM-based Interactive High-Fidelity Face Retouching via Imperfection Prompting.
Abstract: Face retouching aims to remove facial imperfections from images and videos while preserving face attributes. Existing methods are designed for non-interactive, end-to-end retouching, yet interactivity is highly demanded in downstream applications. In this paper, we propose RetouchGPT, a novel framework that leverages Large Language Models (LLMs) to guide the interactive retouching process. Towards this end, we design an instruction-driven imperfection prediction module to accurately identify imperfections by integrating textual and visual features. To learn imperfection prompts, we further incorporate an LLM-based embedding module to fuse multi-modal conditioning information. The prompt-based feature modification is performed in each transformer block, progressively suppressing imperfection features and replacing them with normal skin features. Extensive experiments validate the effectiveness of our design and demonstrate that RetouchGPT is a useful tool for interactive face retouching, achieving superior performance over state-of-the-art methods.
RetouchGPT achieves user feedback-aligned and high-fidelity retouching results under the guidance of large language models (LLMs). The key contributions of this work include:
- Interactive Retouching: Different from the existing methods that perform single-stage face retouching, the proposed RetouchGPT is capable of interactive retouching by working together with LLM.
- Instruction-Driven Imperfection Prediction (IIP): By integrating user’s instruction and visual features, facial imperfection prediction performance can be improved significantly.
- LLM-Based Embedding (LBE): Designed the LBE module to fuse textual and visual conditioning information, generating imperfection prompts. These prompts guide content generation in imperfection regions by leveraging a latent transformer with cross-attention-based feature modification at each block.
- State-of-the-Art Performance: We fuse multimodal conditioning information via LLM to obtain imperfection prompts, which controls imperfection feature modification in the interactive retouching process.
To run RetouchGPT, follow these steps:
-
Install and activate the required packages using the requirements.txt file:
conda create -n retouchgpt python=3.8 pip install -r requirements.txt
-
Obtain the Flickr-Faces-HQ-Retouching (FFHQR) Dataset.
-
Arrange the dataset in the following structure:
face_retouching/ ├── train/ │ ├── source/ │ └── target/ └── test/ ├── source/ └── target/
- Select and download your preferred Large Language Model (LLM). Ensure it is compatible with your hardware.
-
Navigate to the project directory and activate the environment:
cd RetouchGPT conda activate retouchgpt python train.py
To test RetouchGPT:
-
Navigate to the Project Directory:
cd RetouchGPT
-
Run the Testing Script: Execute the testing script with:
python test.py
- You can input user instructions and in-the-wild images for retouching purposes.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for RetouchGPT
Similar Open Source Tools
data:image/s3,"s3://crabby-images/4cd1c/4cd1cb1410b65654a3c63434adbf92770402e69d" alt="RetouchGPT Screenshot"
RetouchGPT
RetouchGPT is a novel framework designed for interactive face retouching using Large Language Models (LLMs). It leverages instruction-driven imperfection prediction and LLM-based embedding to guide the retouching process. The tool allows users to interactively modify imperfection features in face images, achieving high-fidelity retouching results. RetouchGPT outperforms existing methods by integrating textual and visual features to accurately identify imperfections and replace them with normal skin features.
data:image/s3,"s3://crabby-images/1005c/1005ce3368424fd01a5dbaf0c6d0fedec663b560" alt="CogVideo Screenshot"
CogVideo
CogVideo is an open-source repository that provides pretrained text-to-video models for generating videos based on input text. It includes models like CogVideoX-2B and CogVideo, offering powerful video generation capabilities. The repository offers tools for inference, fine-tuning, and model conversion, along with demos showcasing the model's capabilities through CLI, web UI, and online experiences. CogVideo aims to facilitate the creation of high-quality videos from textual descriptions, catering to a wide range of applications.
data:image/s3,"s3://crabby-images/f6846/f6846632c146f6aa5c777fd24176620ddf3cd6bc" alt="AGI-Papers Screenshot"
AGI-Papers
This repository contains a collection of papers and resources related to Large Language Models (LLMs), including their applications in various domains such as text generation, translation, question answering, and dialogue systems. The repository also includes discussions on the ethical and societal implications of LLMs. **Description** This repository is a collection of papers and resources related to Large Language Models (LLMs). LLMs are a type of artificial intelligence (AI) that can understand and generate human-like text. They have a wide range of applications, including text generation, translation, question answering, and dialogue systems. **For Jobs** - **Content Writer** - **Copywriter** - **Editor** - **Journalist** - **Marketer** **AI Keywords** - **Large Language Models** - **Natural Language Processing** - **Machine Learning** - **Artificial Intelligence** - **Deep Learning** **For Tasks** - **Generate text** - **Translate text** - **Answer questions** - **Engage in dialogue** - **Summarize text**
data:image/s3,"s3://crabby-images/a67a7/a67a779d2c94651cb60b498e19e6fc72c24643ee" alt="LLM-Zero-to-Hundred Screenshot"
LLM-Zero-to-Hundred
LLM-Zero-to-Hundred is a repository showcasing various applications of LLM chatbots and providing insights into training and fine-tuning Language Models. It includes projects like WebGPT, RAG-GPT, WebRAGQuery, LLM Full Finetuning, RAG-Master LLamaindex vs Langchain, open-source-RAG-GEMMA, and HUMAIN: Advanced Multimodal, Multitask Chatbot. The projects cover features like ChatGPT-like interaction, RAG capabilities, image generation and understanding, DuckDuckGo integration, summarization, text and voice interaction, and memory access. Tutorials include LLM Function Calling and Visualizing Text Vectorization. The projects have a general structure with folders for README, HELPER, .env, configs, data, src, images, and utils.
data:image/s3,"s3://crabby-images/51e4b/51e4b7e45dad35732a66ca27a14e05777fe1db3b" alt="Controllable-RAG-Agent Screenshot"
Controllable-RAG-Agent
This repository contains a sophisticated deterministic graph-based solution for answering complex questions using a controllable autonomous agent. The solution is designed to ensure that answers are solely based on the provided data, avoiding hallucinations. It involves various steps such as PDF loading, text preprocessing, summarization, database creation, encoding, and utilizing large language models. The algorithm follows a detailed workflow involving planning, retrieval, answering, replanning, content distillation, and performance evaluation. Heuristics and techniques implemented focus on content encoding, anonymizing questions, task breakdown, content distillation, chain of thought answering, verification, and model performance evaluation.
data:image/s3,"s3://crabby-images/4fcdb/4fcdb5be5b99b1e26c61b279d9406d1287960693" alt="postgresml Screenshot"
postgresml
PostgresML is a powerful Postgres extension that seamlessly combines data storage and machine learning inference within your database. It enables running machine learning and AI operations directly within PostgreSQL, leveraging GPU acceleration for faster computations, integrating state-of-the-art large language models, providing built-in functions for text processing, enabling efficient similarity search, offering diverse ML algorithms, ensuring high performance, scalability, and security, supporting a wide range of NLP tasks, and seamlessly integrating with existing PostgreSQL tools and client libraries.
data:image/s3,"s3://crabby-images/9c172/9c17288d293c5df893f2746e81b260ed273564da" alt="k2 Screenshot"
k2
K2 (GeoLLaMA) is a large language model for geoscience, trained on geoscience literature and fine-tuned with knowledge-intensive instruction data. It outperforms baseline models on objective and subjective tasks. The repository provides K2 weights, core data of GeoSignal, GeoBench benchmark, and code for further pretraining and instruction tuning. The model is available on Hugging Face for use. The project aims to create larger and more powerful geoscience language models in the future.
data:image/s3,"s3://crabby-images/4dfcc/4dfcc1fb6882eeb1262a7c5ef3d1b508207476cf" alt="MME-RealWorld Screenshot"
MME-RealWorld
MME-RealWorld is a benchmark designed to address real-world applications with practical relevance, featuring 13,366 high-resolution images and 29,429 annotations across 43 tasks. It aims to provide substantial recognition challenges and overcome common barriers in existing Multimodal Large Language Model benchmarks, such as small data scale, restricted data quality, and insufficient task difficulty. The dataset offers advantages in data scale, data quality, task difficulty, and real-world utility compared to existing benchmarks. It also includes a Chinese version with additional images and QA pairs focused on Chinese scenarios.
data:image/s3,"s3://crabby-images/33724/337244c5be5fb7120bf812a74c43653dde495833" alt="Reflection_Tuning Screenshot"
Reflection_Tuning
Reflection-Tuning is a project focused on improving the quality of instruction-tuning data through a reflection-based method. It introduces Selective Reflection-Tuning, where the student model can decide whether to accept the improvements made by the teacher model. The project aims to generate high-quality instruction-response pairs by defining specific criteria for the oracle model to follow and respond to. It also evaluates the efficacy and relevance of instruction-response pairs using the r-IFD metric. The project provides code for reflection and selection processes, along with data and model weights for both V1 and V2 methods.
data:image/s3,"s3://crabby-images/948a9/948a9ab765952eeea800700f93241abc08c82b9e" alt="oat Screenshot"
oat
Oat is a simple and efficient framework for running online LLM alignment algorithms. It implements a distributed Actor-Learner-Oracle architecture, with components optimized using state-of-the-art tools. Oat simplifies the experimental pipeline of LLM alignment by serving an Oracle online for preference data labeling and model evaluation. It provides a variety of oracles for simulating feedback and supports verifiable rewards. Oat's modular structure allows for easy inheritance and modification of classes, enabling rapid prototyping and experimentation with new algorithms. The framework implements cutting-edge online algorithms like PPO for math reasoning and various online exploration algorithms.
data:image/s3,"s3://crabby-images/3d5ab/3d5ab3368b0c7278fb52b22588f0f76799049d74" alt="XLearning Screenshot"
XLearning
XLearning is a scheduling platform for big data and artificial intelligence, supporting various machine learning and deep learning frameworks. It runs on Hadoop Yarn and integrates frameworks like TensorFlow, MXNet, Caffe, Theano, PyTorch, Keras, XGBoost. XLearning offers scalability, compatibility, multiple deep learning framework support, unified data management based on HDFS, visualization display, and compatibility with code at native frameworks. It provides functions for data input/output strategies, container management, TensorBoard service, and resource usage metrics display. XLearning requires JDK >= 1.7 and Maven >= 3.3 for compilation, and deployment on CentOS 7.2 with Java >= 1.7 and Hadoop 2.6, 2.7, 2.8.
data:image/s3,"s3://crabby-images/e2127/e2127845d14c976ee9b7c40e40f13717396ecdf9" alt="dash-infer Screenshot"
dash-infer
DashInfer is a C++ runtime tool designed to deliver production-level implementations highly optimized for various hardware architectures, including x86 and ARMv9. It supports Continuous Batching and NUMA-Aware capabilities for CPU, and can fully utilize modern server-grade CPUs to host large language models (LLMs) up to 14B in size. With lightweight architecture, high precision, support for mainstream open-source LLMs, post-training quantization, optimized computation kernels, NUMA-aware design, and multi-language API interfaces, DashInfer provides a versatile solution for efficient inference tasks. It supports x86 CPUs with AVX2 instruction set and ARMv9 CPUs with SVE instruction set, along with various data types like FP32, BF16, and InstantQuant. DashInfer also offers single-NUMA and multi-NUMA architectures for model inference, with detailed performance tests and inference accuracy evaluations available. The tool is supported on mainstream Linux server operating systems and provides documentation and examples for easy integration and usage.
data:image/s3,"s3://crabby-images/0aeaa/0aeaae4db61c3b765265e2341fcf82ba54b3f905" alt="Advanced-QA-and-RAG-Series Screenshot"
Advanced-QA-and-RAG-Series
This repository contains advanced LLM-based chatbots for Retrieval Augmented Generation (RAG) and Q&A with different databases. It provides guides on using AzureOpenAI and OpenAI API for each project. The projects include Q&A and RAG with SQL and Tabular Data, and KnowledgeGraph Q&A and RAG with Tabular Data. Key notes emphasize the importance of good column names, read-only database access, and familiarity with query languages. The chatbots allow users to interact with SQL databases, CSV, XLSX files, and graph databases using natural language.
data:image/s3,"s3://crabby-images/f7483/f748376626149da39dabfd5369b7ac80b7722326" alt="arbigent Screenshot"
arbigent
Arbigent (Arbiter-Agent) is an AI agent testing framework designed to make AI agent testing practical for modern applications. It addresses challenges faced by traditional UI testing frameworks and AI agents by breaking down complex tasks into smaller, dependent scenarios. The framework is customizable for various AI providers, operating systems, and form factors, empowering users with extensive customization capabilities. Arbigent offers an intuitive UI for scenario creation and a powerful code interface for seamless test execution. It supports multiple form factors, optimizes UI for AI interaction, and is cost-effective by utilizing models like GPT-4o mini. With a flexible code interface and open-source nature, Arbigent aims to revolutionize AI agent testing in modern applications.
data:image/s3,"s3://crabby-images/2a753/2a753dd609f28e190c566cd9c65e843550c63ac8" alt="peridyno Screenshot"
peridyno
PeriDyno is a CUDA-based, highly parallel physics engine targeted at providing real-time simulation of physical environments for intelligent agents. It is designed to be easy to use and integrate into existing projects, and it provides a wide range of features for simulating a variety of physical phenomena. PeriDyno is open source and available under the Apache 2.0 license.
data:image/s3,"s3://crabby-images/fb56b/fb56b20865b8a78766fe21b1f6c6b558bea08986" alt="Mooncake Screenshot"
Mooncake
Mooncake is a serving platform for Kimi, a leading LLM service provided by Moonshot AI. It features a KVCache-centric disaggregated architecture that separates prefill and decoding clusters, leveraging underutilized CPU, DRAM, and SSD resources of the GPU cluster. Mooncake's scheduler balances throughput and latency-related SLOs, with a prediction-based early rejection policy for highly overloaded scenarios. It excels in long-context scenarios, achieving up to a 525% increase in throughput while handling 75% more requests under real workloads.
For similar tasks
data:image/s3,"s3://crabby-images/4cd1c/4cd1cb1410b65654a3c63434adbf92770402e69d" alt="RetouchGPT Screenshot"
RetouchGPT
RetouchGPT is a novel framework designed for interactive face retouching using Large Language Models (LLMs). It leverages instruction-driven imperfection prediction and LLM-based embedding to guide the retouching process. The tool allows users to interactively modify imperfection features in face images, achieving high-fidelity retouching results. RetouchGPT outperforms existing methods by integrating textual and visual features to accurately identify imperfections and replace them with normal skin features.
data:image/s3,"s3://crabby-images/84881/84881c00fcf1dd5bf97c32a3c0822a9fd9ce2711" alt="clarity-upscaler Screenshot"
clarity-upscaler
Clarity AI is a free and open-source AI image upscaler and enhancer, providing an alternative to Magnific. It offers various features such as multi-step upscaling, resemblance fixing, speed improvements, support for custom safetensors checkpoints, anime upscaling, LoRa support, pre-downscaling, and fractality. Users can access the tool through the ClarityAI.co app, ComfyUI manager, API, or by deploying and running locally or in the cloud with cog or A1111 webUI. The tool aims to enhance image quality and resolution using advanced AI algorithms and models.
data:image/s3,"s3://crabby-images/28412/28412ae518cdf66d6f8b25c2c6fa12b78782d333" alt="awesome-ai-painting Screenshot"
awesome-ai-painting
This repository, named 'awesome-ai-painting', is a comprehensive collection of resources related to AI painting. It is curated by a user named 秋风, who is an AI painting enthusiast with a background in the AIGC industry. The repository aims to help more people learn AI painting and also documents the user's goal of creating 100 AI products, with current progress at 4/100. The repository includes information on various AI painting products, tutorials, tools, and models, providing a valuable resource for individuals interested in AI painting and related technologies.
data:image/s3,"s3://crabby-images/44784/447842f25954e0366ab41556ad4cfb6cb99377f9" alt="SUPIR Screenshot"
SUPIR
SUPIR is an AI-based image processing and upscaling tool that leverages cutting-edge technology to enhance image quality and resolution. The tool provides users with the ability to upscale images with high generalization and quality, as well as specific settings for light degradation scenarios. It offers a range of models and checkpoints for different use cases, along with detailed instructions for installation and usage. SUPIR also includes features for color fixing, linear CFG adjustments, and various prompts for image enhancement. The tool is designed for non-commercial use only and comes with a contact email for inquiries and permission requests for commercial use.
For similar jobs
data:image/s3,"s3://crabby-images/19d15/19d15278e90e0552fa2d625de82bbb72c3580112" alt="MagicMirror Screenshot"
MagicMirror
MagicMirror is an AI-powered tool that allows users to instantly try on new faces, hairstyles, and outfits with a simple drag and drop interface. It runs smoothly on standard computers without the need for dedicated GPU hardware, ensuring privacy with completely offline processing. The tool is ultra-lightweight with a small installer size and model files, providing a fun and easy way to experiment with different looks.
data:image/s3,"s3://crabby-images/4cd1c/4cd1cb1410b65654a3c63434adbf92770402e69d" alt="RetouchGPT Screenshot"
RetouchGPT
RetouchGPT is a novel framework designed for interactive face retouching using Large Language Models (LLMs). It leverages instruction-driven imperfection prediction and LLM-based embedding to guide the retouching process. The tool allows users to interactively modify imperfection features in face images, achieving high-fidelity retouching results. RetouchGPT outperforms existing methods by integrating textual and visual features to accurately identify imperfections and replace them with normal skin features.
data:image/s3,"s3://crabby-images/91dfc/91dfc6cad662b48eb625dc560e81b7c974a4a3e0" alt="ap-plugin Screenshot"
ap-plugin
AP-PLUGIN is an AI drawing plugin for the Yunzai series robot framework, allowing you to have a convenient AI drawing experience in the input box. It uses the open source Stable Diffusion web UI as the backend, deploys it for free, and generates a variety of images with richer functions.
data:image/s3,"s3://crabby-images/a502c/a502c6eccce579d9b089defe71c5783077b7bc87" alt="cog-comfyui Screenshot"
cog-comfyui
Cog-comfyui allows users to run ComfyUI workflows on Replicate. ComfyUI is a visual programming tool for creating and sharing generative art workflows. With cog-comfyui, users can access a variety of pre-trained models and custom nodes to create their own unique artworks. The tool is easy to use and does not require any coding experience. Users simply need to upload their API JSON file and any necessary input files, and then click the "Run" button. Cog-comfyui will then generate the output image or video file.
data:image/s3,"s3://crabby-images/a67f4/a67f4cd59bb047f01c07982177a4b5339598e2bb" alt="Adobe-Photoshop-AI-Crack Screenshot"
Adobe-Photoshop-AI-Crack
Adobe Photoshop 2024 is the latest version of the program for processing raster graphics. It supports a variety of graphic formats and allows both the creation and editing of images. It is used for creating photorealistic images, working with color scanned images, retouching, color correction, collaging, graphic transformation, color separation, and more. Adobe Photoshop encompasses all methods of working with bitmap images, utilizes layers, and contours. The program is an undisputed leader among professional graphic editors due to its extensive capabilities, high efficiency, and speed. Adobe Photoshop provides all the necessary tools for correction, editing, preparing images for printing, and high-quality output.
data:image/s3,"s3://crabby-images/e3f41/e3f41c60408d14aa494db3d101832e559bfc1e7e" alt="IOPaint Screenshot"
IOPaint
IOPaint is a free and open-source inpainting & outpainting tool powered by SOTA AI model. It supports various AI models to perform erase, inpainting, or outpainting tasks. Users can remove unwanted objects, defects, watermarks, or people from images using erase models. Additionally, diffusion models can replace objects or perform outpainting. The tool also offers plugins for interactive object segmentation, background removal, anime segmentation, super resolution, face restoration, and file management. IOPaint provides a web UI for easy access to the latest AI models and supports batch processing of images through the command line. Developers can contribute to the project by installing front-end dependencies, setting up the backend, and starting the development environment for both front-end and back-end components.
data:image/s3,"s3://crabby-images/2d94a/2d94ae42d357a271288803bdcda5275d11124efe" alt="adobe-photoshopCRCK Screenshot"
adobe-photoshopCRCK
Adobe PhotoshopCRCK is a tool designed to provide users with the latest version of Adobe Photoshop for free on Windows. It allows users to access advanced photo editing features and functionalities without the need for a paid subscription. The tool is intended for individuals looking to explore professional photo editing capabilities without incurring additional costs. With Adobe PhotoshopCRCK, users can enhance their images, create stunning graphics, and unleash their creativity through a wide range of editing tools and options.
data:image/s3,"s3://crabby-images/705f1/705f1f4349ed02b2bb48c86e6ece89fad0a155b4" alt="DeepNude-AI-List Screenshot"
DeepNude-AI-List
DeepNude AI List is a compilation of various NSFW AI tools that are designed for generating nude or suggestive content. The list includes tools like Dreampaint.net, Nudify.me, NoDress.io, Undress Her, and more. These tools utilize artificial intelligence algorithms to manipulate images and create provocative visuals. Users should exercise caution and responsibility when using such tools, as they may raise ethical and privacy concerns.