HivisionIDPhotos

⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。

Stars: 10308

Visit

HivisionIDPhoto is a practical algorithm for intelligent ID photo creation. It utilizes a comprehensive model workflow to recognize, cut out, and generate ID photos for various user photo scenarios. The tool offers lightweight cutting, standard ID photo generation based on different size specifications, six-inch layout photo generation, beauty enhancement (waiting), and intelligent outfit swapping (waiting). It aims to solve emergency ID photo creation issues.

README:

HivisionIDPhoto

English / 中文 / 日本語 / 한국어

相关项目：

SwanLab：训练人像抠图模型全程用它来分析和监控，以及和实验室同学协作交流，大幅提升了训练效率。

🤩 最近更新

在线体验：、、
2024.09.25: 增加五寸相纸和JPEG下载选项｜默认照片下载支持300DPI
2024.09.24: API接口增加base64图像传入选项 | Gradio Demo增加排版照裁剪线功能
2024.09.22: Gradio Demo增加野兽模式，可设置内存加载策略 | API接口增加dpi、face_alignment参数
2024.09.18: Gradio Demo增加分享模版照功能、增加美式证件照背景选项
2024.09.17: Gradio Demo增加自定义底色-HEX输入功能 | （社区贡献）C++版本 - HivisionIDPhotos-cpp 贡献 by zjkhahah
2024.09.16: Gradio Demo增加人脸旋转对齐功能，自定义尺寸输入支持毫米单位
2024.09.14: Gradio Demo增加自定义DPI功能，增加日语和韩语支持，增加调整亮度、对比度、锐度功能
2024.09.12: Gradio Demo增加美白功能 | API接口增加加水印、设置照片KB值大小、证件照裁切

项目简介

🚀 谢谢你对我们的工作感兴趣。您可能还想查看我们在图像领域的其他成果，欢迎来信:[email protected].

HivisionIDPhoto 旨在开发一种实用、系统性的证件照智能制作算法。

它利用一套完善的AI模型工作流程，实现对多种用户拍照场景的识别、抠图与证件照生成。

HivisionIDPhoto 可以做到：

轻量级抠图（纯离线，仅需 CPU 即可快速推理）
根据不同尺寸规格生成不同的标准证件照、六寸排版照
支持纯离线或端云推理
美颜
智能换正装（waiting）

如果 HivisionIDPhoto 对你有帮助，请 star 这个 repo 或推荐给你的朋友，解决证件照应急制作问题！

🏠 社区

我们分享了一些由社区构建的HivisionIDPhotos的有趣应用和扩展：

HivisionIDPhotos-ComfyUI	HivisionIDPhotos-wechat-weapp

ComfyUI证件照处理工作流	证件照微信小程序（JAVA后端+原生前端）

HivisionIDPhotos-Uniapp	HivisionIDPhotos-web

证件照微信小程序（uniapp）	证件照应用网页版

HivisionIDPhotos-cpp: HivisionIDphotos C++版本，由 zjkhahah 构建
ai-idphoto: HivisionIDPhotos-wechat-weapp 的uniapp多端兼容版，由 wmlcjj 贡献
HivisionIDPhotos-uniapp-WeChat-gpto1: 由gpt-o1辅助完成开发的证件照微信小程序，由 jkm199 贡献
HivisionIDPhotos-windows-GUI：Windows客户端应用，由 zhaoyun0071 构建
HivisionIDPhotos-NAS: 群晖NAS部署中文教程，由 ONG-Leo 贡献

🔧 准备工作

环境安装与依赖：

Python >= 3.7（项目主要测试在 python 3.10）
OS: Linux, Windows, MacOS

1. 克隆项目

git clone https://github.com/Zeyi-Lin/HivisionIDPhotos.git
cd  HivisionIDPhotos

2. 安装依赖环境

建议 conda 创建一个 python3.10 虚拟环境后，执行以下命令

pip install -r requirements.txt
pip install -r requirements-app.txt

3. 下载人像抠图模型权重文件

方式一：脚本下载

python scripts/download_model.py --models all
# 如需指定下载某个模型
# python scripts/download_model.py --models modnet_photographic_portrait_matting

方式二：直接下载

模型均存到项目的hivision/creator/weights目录下：

人像抠图模型	介绍	下载
MODNet	MODNet官方权重	下载(24.7MB)
hivision_modnet	对纯色换底适配性更好的抠图模型	下载(24.7MB)
rmbg-1.4	BRIA AI 开源的抠图模型	下载(176.2MB)后重命名为`rmbg-1.4.onnx`
birefnet-v1-lite	ZhengPeng7 开源的抠图模型，拥有最好的分割精度	下载(224MB)后重命名为`birefnet-v1-lite.onnx`

如果下载网速不顺利：前往SwanHub下载。

4. 人脸检测模型配置（可选）

拓展人脸检测模型	介绍	使用文档
MTCNN	离线人脸检测模型，高性能CPU推理（毫秒级），为默认模型，检测精度较低	Clone此项目后直接使用
RetinaFace	离线人脸检测模型，CPU推理速度中等（秒级），精度较高	下载后放到`hivision/creator/retinaface/weights`目录下
Face++	旷视推出的在线人脸检测API，检测精度较高，官方文档	使用文档

5. 性能参考

测试环境为Mac M1 Max 64GB，非GPU加速，测试图片分辨率为 512x715(1) 与 764×1146(2)。

模型组合	内存占用	推理时长(1)	推理时长(2)
MODNet + mtcnn	410MB	0.207s	0.246s
MODNet + retinaface	405MB	0.571s	0.971s
birefnet-v1-lite + retinaface	6.20GB	7.063s	7.128s

6. GPU推理加速（可选）

在当前版本，可被英伟达GPU加速的模型为birefnet-v1-lite，并请确保你有16GB左右的显存。

如需使用英伟达GPU加速推理，在确保你已经安装CUDA与cuDNN后，根据onnxruntime-gpu文档找到对应的onnxruntime-gpu版本安装，以及根据pytorch官网找到对应的torch版本安装。

# 假如你的电脑安装的是CUDA 12.x, cuDNN 8
# 安装torch是可选的，如果你始终配置不好cuDNN，那么试试安装torch
pip install onnxruntime-gpu==1.18.0
pip install torch --index-url https://download.pytorch.org/whl/cu121

完成安装后，调用birefnet-v1-lite模型即可利用GPU加速推理。

TIPS: CUDA 支持向下兼容。比如你的 CUDA 版本为 12.6，torch 官方目前支持的最高版本为 12.4（<12.6），torch仍可以正常使用CUDA。

⚡️ 运行 Gradio Demo

python app.py

运行程序将生成一个本地 Web 页面，在页面中可完成证件照的操作与交互。

🚀 Python 推理

核心参数：

-i: 输入图像路径
-o: 保存图像路径
-t: 推理类型，有idphoto、human_matting、add_background、generate_layout_photos可选
--matting_model: 人像抠图模型权重选择
--face_detect_model: 人脸检测模型选择

更多参数可通过python inference.py --help查看

1. 证件照制作

输入 1 张照片，获得 1 张标准证件照和 1 张高清证件照的 4 通道透明 png

python inference.py -i demo/images/test0.jpg -o ./idphoto.png --height 413 --width 295

2. 人像抠图

输入 1 张照片，获得 1张 4 通道透明 png

python inference.py -t human_matting -i demo/images/test0.jpg -o ./idphoto_matting.png --matting_model hivision_modnet

3. 透明图增加底色

输入 1 张 4 通道透明 png，获得 1 张增加了底色的 3通道图像

python inference.py -t add_background -i ./idphoto.png -o ./idphoto_ab.jpg  -c 4f83ce -k 30 -r 1

4. 得到六寸排版照

输入 1 张 3 通道照片，获得 1 张六寸排版照

python inference.py -t generate_layout_photos -i ./idphoto_ab.jpg -o ./idphoto_layout.jpg  --height 413 --width 295 -k 200

5. 证件照裁剪

输入 1 张 4 通道照片（抠图好的图像），获得 1 张标准证件照和 1 张高清证件照的 4 通道透明 png

python inference.py -t idphoto_crop -i ./idphoto_matting.png -o ./idphoto_crop.png --height 413 --width 295

⚡️ 部署 API 服务

启动后端

python deploy_api.py

请求 API 服务

详细请求方式请参考 API 文档，包含以下请求示例：

cURL
Python

🐳 Docker 部署

1. 拉取或构建镜像

以下方式三选一

方式一：拉取最新镜像：

docker pull linzeyi/hivision_idphotos

方式二：Dockrfile 直接构建镜像：

在确保将至少一个抠图模型权重文件放到hivision/creator/weights下后，在项目根目录执行：

docker build -t linzeyi/hivision_idphotos .

方式三：Docker compose 构建：

在确保将至少一个抠图模型权重文件放到hivision/creator/weights下后，在项目根目录下执行：

docker compose build

2. 运行服务

启动 Gradio Demo 服务

运行下面的命令，在你的本地访问 http://127.0.0.1:7860 即可使用。

docker run -d -p 7860:7860 linzeyi/hivision_idphotos

启动 API 后端服务

docker run -d -p 8080:8080 linzeyi/hivision_idphotos python3 deploy_api.py

两个服务同时启动

docker compose up -d

环境变量

本项目提供了一些额外的配置项，使用环境变量进行设置：

环境变量	类型	描述	示例
FACE_PLUS_API_KEY	可选	这是你在 Face++ 控制台申请的 API 密钥	`7-fZStDJ····`
FACE_PLUS_API_SECRET	可选	Face++ API密钥对应的Secret	`VTee824E····`
RUN_MODE	可选	运行模式，可选值为`beast`(野兽模式)。野兽模式下人脸检测和抠图模型将不释放内存，从而获得更快的二次推理速度。建议内存16GB以上尝试。	`beast`
DEFAULT_LANG	可选	Gradio Demo启动时的默认语言	`en`

docker使用环境变量示例：

docker run  -d -p 7860:7860 \
    -e FACE_PLUS_API_KEY=7-fZStDJ···· \
    -e FACE_PLUS_API_SECRET=VTee824E···· \
    -e RUN_MODE=beast \
    -e DEFAULT_LANG=en \
    linzeyi/hivision_idphotos

FAQ

1. 如何修改预设尺寸和颜色？

尺寸：修改size_list_CN.csv后再次运行 app.py 即可，其中第一列为尺寸名，第二列为高度，第三列为宽度。
颜色：修改color_list_CN.csv后再次运行 app.py 即可，其中第一列为颜色名，第二列为Hex值。

2. 如何修改水印字体？

将字体文件放到hivision/plugin/font文件夹下
修改hivision/plugin/watermark.py的font_file参数值为字体文件名

3. 如何添加社交媒体模板照？

将模板图片放到hivision/plugin/template/assets文件夹下。模板图片是一个4通道的透明png。
在hivision/plugin/template/assets/template_config.json文件中添加最新的模板信息，其中width为模板图宽度(px)，height为模板图高度(px)，anchor_points为模板中透明区域的四个角的坐标(px)；rotation为透明区域相对于垂直方向的旋转角度，>0为逆时针，<0为顺时针。
在demo/processor.py的_generate_image_template函数中的TEMPLATE_NAME_LIST变量添加最新的模板名

4. 如何修改Gradio Demo的顶部导航栏？

修改demo/assets/title.md

📧 联系我们

如果您有任何问题，请发邮件至 [email protected]

🙏 感谢支持

贡献者们：

Zeyi-Lin、SAKURA-CAT、Feudalman、swpfY、Kaikaikaifang、ShaohonChen、KashiwaByte

📜 Lincese

This repository is licensed under the Apache-2.0 License.

📚 引用

如果您在研究或项目中使用了HivisionIDPhotos，请考虑引用我们的工作。您可以使用以下BibTeX条目：

@misc{hivisionidphotos,
      title={{HivisionIDPhotos: A Lightweight and Efficient AI ID Photos Tool}},
      author={Zeyi Lin and SwanLab Team},
      year={2024},
      publisher={GitHub},
      url = {\url{https://github.com/Zeyi-Lin/HivisionIDPhotos}},
}

For Tasks:

Click tags to check more tools for each tasks

create id photos generate layout photos enhance beauty swap outfits cut out images

For Jobs:

photographer assistant graphic designer photo editor ai engineer software developer

Alternative AI tools for HivisionIDPhotos

Similar Open Source Tools

HivisionIDPhotos

github

: 10.3k

agentica

Agentica is a human-centric framework for building large language model agents. It provides functionalities for planning, memory management, tool usage, and supports features like reflection, planning and execution, RAG, multi-agent, multi-role, and workflow. The tool allows users to quickly code and orchestrate agents, customize prompts, and make API calls to various services. It supports API calls to OpenAI, Azure, Deepseek, Moonshot, Claude, Ollama, and Together. Agentica aims to simplify the process of building AI agents by providing a user-friendly interface and a range of functionalities for agent development.

github

: 108

Element-Plus-X

github

: 289

k8m

k8m is an AI-driven Mini Kubernetes AI Dashboard lightweight console tool designed to simplify cluster management. It is built on AMIS and uses 'kom' as the Kubernetes API client. k8m has built-in Qwen2.5-Coder-7B model interaction capabilities and supports integration with your own private large models. Its key features include miniaturized design for easy deployment, user-friendly interface for intuitive operation, efficient performance with backend in Golang and frontend based on Baidu AMIS, pod file management for browsing, editing, uploading, downloading, and deleting files, pod runtime management for real-time log viewing, log downloading, and executing shell commands within pods, CRD management for automatic discovery and management of CRD resources, and intelligent translation and diagnosis based on ChatGPT for YAML property translation, Describe information interpretation, AI log diagnosis, and command recommendations, providing intelligent support for managing k8s. It is cross-platform compatible with Linux, macOS, and Windows, supporting multiple architectures like x86 and ARM for seamless operation. k8m's design philosophy is 'AI-driven, lightweight and efficient, simplifying complexity,' helping developers and operators quickly get started and easily manage Kubernetes clusters.

github

: 157

kirara-ai

Kirara AI is a chatbot that supports mainstream large language models and chat platforms. It provides features such as image sending, keyword-triggered replies, multi-account support, personality settings, and support for various chat platforms like QQ, Telegram, Discord, and WeChat. The tool also supports HTTP server for Web API, popular large models like OpenAI and DeepSeek, plugin mechanism, conditional triggers, admin commands, drawing models, voice replies, multi-turn conversations, cross-platform message sending, custom workflows, web management interface, and built-in Frpc intranet penetration.

github

: 14.9k

Awesome-ChatTTS

Awesome-ChatTTS is an official recommended guide for ChatTTS beginners, compiling common questions and related resources. It provides a comprehensive overview of the project, including official introduction, quick experience options, popular branches, parameter explanations, voice seed details, installation guides, FAQs, and error troubleshooting. The repository also includes video tutorials, discussion community links, and project trends analysis. Users can explore various branches for different functionalities and enhancements related to ChatTTS.

github

: 594

build_MiniLLM_from_scratch

This repository aims to build a low-parameter LLM model through pretraining, fine-tuning, model rewarding, and reinforcement learning stages to create a chat model capable of simple conversation tasks. It features using the bert4torch training framework, seamless integration with transformers package for inference, optimized file reading during training to reduce memory usage, providing complete training logs for reproducibility, and the ability to customize robot attributes. The chat model supports multi-turn conversations. The trained model currently only supports basic chat functionality due to limitations in corpus size, model scale, SFT corpus size, and quality.

github

: 397

DeepAI

DeepAI is a proxy server that enhances the interaction experience of large language models (LLMs) by integrating the 'thinking chain' process. It acts as an intermediary layer, receiving standard OpenAI API compatible requests, using independent 'thinking services' to generate reasoning processes, and then forwarding the enhanced requests to the LLM backend of your choice. This ensures that responses are not only generated by the LLM but also based on pre-inference analysis, resulting in more insightful and coherent answers. DeepAI supports seamless integration with applications designed for the OpenAI API, providing endpoints for '/v1/chat/completions' and '/v1/models', making it easy to integrate into existing applications. It offers features such as reasoning chain enhancement, flexible backend support, API key routing, weighted random selection, proxy support, comprehensive logging, and graceful shutdown.

github

: 121

ChuanhuChatGPT

Chuanhu Chat is a user-friendly web graphical interface that provides various additional features for ChatGPT and other language models. It supports GPT-4, file-based question answering, local deployment of language models, online search, agent assistant, and fine-tuning. The tool offers a range of functionalities including auto-solving questions, online searching with network support, knowledge base for quick reading, local deployment of language models, GPT 3.5 fine-tuning, and custom model integration. It also features system prompts for effective role-playing, basic conversation capabilities with options to regenerate or delete dialogues, conversation history management with auto-saving and search functionalities, and a visually appealing user experience with themes, dark mode, LaTeX rendering, and PWA application support.

github

: 15.2k

video-subtitle-remover

Video-subtitle-remover (VSR) is a software based on AI technology that removes hard subtitles from videos. It achieves the following functions: - Lossless resolution: Remove hard subtitles from videos, generate files with subtitles removed - Fill the region of removed subtitles using a powerful AI algorithm model (non-adjacent pixel filling and mosaic removal) - Support custom subtitle positions, only remove subtitles in defined positions (input position) - Support automatic removal of all text in the entire video (no input position required) - Support batch removal of watermark text from multiple images.

github

: 4.0k

Langchain-Chatchat

LangChain-Chatchat is an open-source, offline-deployable retrieval-enhanced generation (RAG) large model knowledge base project based on large language models such as ChatGLM and application frameworks such as Langchain. It aims to establish a knowledge base Q&A solution that is friendly to Chinese scenarios, supports open-source models, and can run offline.

github

: 34.4k

XianyuAutoAgent

Xianyu AutoAgent is an AI customer service robot system specifically designed for the Xianyu platform, providing 24/7 automated customer service, supporting multi-expert collaborative decision-making, intelligent bargaining, and context-aware conversations. The system includes intelligent conversation engine with features like context awareness and expert routing, business function matrix with modules like core engine, bargaining system, technical support, and operation monitoring. It requires Python 3.8+ and NodeJS 18+ for installation and operation. Users can customize prompts for different experts and contribute to the project through issues or pull requests.

github

: 973

BlueLM

BlueLM is a large-scale pre-trained language model developed by vivo AI Global Research Institute, featuring 7B base and chat models. It includes high-quality training data with a token scale of 26 trillion, supporting both Chinese and English languages. BlueLM-7B-Chat excels in C-Eval and CMMLU evaluations, providing strong competition among open-source models of similar size. The models support 32K long texts for better context understanding while maintaining base capabilities. BlueLM welcomes developers for academic research and commercial applications.

github

: 869

TelegramForwarder

Telegram Forwarder is a message forwarding tool that allows you to forward messages from specified chats to other chats without the need for a bot to enter the corresponding channels/groups to listen. It can be used for information stream integration filtering, message reminders, content archiving, and more. The tool supports multiple sources forwarding, keyword filtering in whitelist and blacklist modes, regular expression matching, message content modification, AI processing using major vendors' AI interfaces, media file filtering, and synchronization with a universal forum blocking plugin to achieve three-end blocking.

github

: 193

Speech-AI-Forge

Speech-AI-Forge is a project developed around TTS generation models, implementing an API Server and a WebUI based on Gradio. The project offers various ways to experience and deploy Speech-AI-Forge, including online experience on HuggingFace Spaces, one-click launch on Colab, container deployment with Docker, and local deployment. The WebUI features include TTS model functionality, speaker switch for changing voices, style control, long text support with automatic text segmentation, refiner for ChatTTS native text refinement, various tools for voice control and enhancement, support for multiple TTS models, SSML synthesis control, podcast creation tools, voice creation, voice testing, ASR tools, and post-processing tools. The API Server can be launched separately for higher API throughput. The project roadmap includes support for various TTS models, ASR models, voice clone models, and enhancer models. Model downloads can be manually initiated using provided scripts. The project aims to provide inference services and may include training-related functionalities in the future.

github

: 1.2k

chatgpt-mirai-qq-bot

Kirara AI is a chatbot that supports mainstream language models and chat platforms. It features various functionalities such as image sending, keyword-triggered replies, multi-account support, content moderation, personality settings, and support for platforms like QQ, Telegram, Discord, and WeChat. It also offers HTTP server capabilities, plugin support, conditional triggers, admin commands, drawing models, voice replies, multi-turn conversations, cross-platform message sending, and custom workflows. The tool can be accessed via HTTP API for integration with other platforms.

github

: 14.4k

For similar tasks

HivisionIDPhotos

github

: 10.3k

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k