HivisionIDPhotos
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
Stars: 10308
HivisionIDPhoto is a practical algorithm for intelligent ID photo creation. It utilizes a comprehensive model workflow to recognize, cut out, and generate ID photos for various user photo scenarios. The tool offers lightweight cutting, standard ID photo generation based on different size specifications, six-inch layout photo generation, beauty enhancement (waiting), and intelligent outfit swapping (waiting). It aims to solve emergency ID photo creation issues.
README:
相关项目:
- SwanLab:训练人像抠图模型全程用它来分析和监控,以及和实验室同学协作交流,大幅提升了训练效率。
-
2024.09.25: 增加五寸相纸和JPEG下载选项|默认照片下载支持300DPI
-
2024.09.24: API接口增加base64图像传入选项 | Gradio Demo增加排版照裁剪线功能
-
2024.09.22: Gradio Demo增加野兽模式,可设置内存加载策略 | API接口增加dpi、face_alignment参数
-
2024.09.18: Gradio Demo增加分享模版照功能、增加美式证件照背景选项
-
2024.09.17: Gradio Demo增加自定义底色-HEX输入功能 | (社区贡献)C++版本 - HivisionIDPhotos-cpp 贡献 by zjkhahah
-
2024.09.16: Gradio Demo增加人脸旋转对齐功能,自定义尺寸输入支持毫米单位
-
2024.09.14: Gradio Demo增加自定义DPI功能,增加日语和韩语支持,增加调整亮度、对比度、锐度功能
-
2024.09.12: Gradio Demo增加美白功能 | API接口增加加水印、设置照片KB值大小、证件照裁切
🚀 谢谢你对我们的工作感兴趣。您可能还想查看我们在图像领域的其他成果,欢迎来信:[email protected].
HivisionIDPhoto 旨在开发一种实用、系统性的证件照智能制作算法。
它利用一套完善的AI模型工作流程,实现对多种用户拍照场景的识别、抠图与证件照生成。
HivisionIDPhoto 可以做到:
- 轻量级抠图(纯离线,仅需 CPU 即可快速推理)
- 根据不同尺寸规格生成不同的标准证件照、六寸排版照
- 支持 纯离线 或 端云 推理
- 美颜
- 智能换正装(waiting)
如果 HivisionIDPhoto 对你有帮助,请 star 这个 repo 或推荐给你的朋友,解决证件照应急制作问题!
我们分享了一些由社区构建的HivisionIDPhotos的有趣应用和扩展:
| HivisionIDPhotos-ComfyUI | HivisionIDPhotos-wechat-weapp |
|---|---|
|
|
| ComfyUI证件照处理工作流 | 证件照微信小程序(JAVA后端+原生前端) |
| HivisionIDPhotos-Uniapp | HivisionIDPhotos-web |
|---|---|
|
|
| 证件照微信小程序(uniapp) | 证件照应用网页版 |
- HivisionIDPhotos-cpp: HivisionIDphotos C++版本,由 zjkhahah 构建
- ai-idphoto: HivisionIDPhotos-wechat-weapp 的uniapp多端兼容版,由 wmlcjj 贡献
- HivisionIDPhotos-uniapp-WeChat-gpto1: 由gpt-o1辅助完成开发的证件照微信小程序,由 jkm199 贡献
- HivisionIDPhotos-windows-GUI:Windows客户端应用,由 zhaoyun0071 构建
- HivisionIDPhotos-NAS: 群晖NAS部署中文教程,由 ONG-Leo 贡献
环境安装与依赖:
- Python >= 3.7(项目主要测试在 python 3.10)
- OS: Linux, Windows, MacOS
git clone https://github.com/Zeyi-Lin/HivisionIDPhotos.git
cd HivisionIDPhotos建议 conda 创建一个 python3.10 虚拟环境后,执行以下命令
pip install -r requirements.txt
pip install -r requirements-app.txt方式一:脚本下载
python scripts/download_model.py --models all
# 如需指定下载某个模型
# python scripts/download_model.py --models modnet_photographic_portrait_matting方式二:直接下载
模型均存到项目的hivision/creator/weights目录下:
| 人像抠图模型 | 介绍 | 下载 |
|---|---|---|
| MODNet | MODNet官方权重 | 下载(24.7MB) |
| hivision_modnet | 对纯色换底适配性更好的抠图模型 | 下载(24.7MB) |
| rmbg-1.4 | BRIA AI 开源的抠图模型 |
下载(176.2MB)后重命名为rmbg-1.4.onnx
|
| birefnet-v1-lite | ZhengPeng7 开源的抠图模型,拥有最好的分割精度 |
下载(224MB)后重命名为birefnet-v1-lite.onnx
|
如果下载网速不顺利:前往SwanHub下载。
| 拓展人脸检测模型 | 介绍 | 使用文档 |
|---|---|---|
| MTCNN | 离线人脸检测模型,高性能CPU推理(毫秒级),为默认模型,检测精度较低 | Clone此项目后直接使用 |
| RetinaFace | 离线人脸检测模型,CPU推理速度中等(秒级),精度较高 |
下载后放到hivision/creator/retinaface/weights目录下 |
| Face++ | 旷视推出的在线人脸检测API,检测精度较高,官方文档 | 使用文档 |
测试环境为Mac M1 Max 64GB,非GPU加速,测试图片分辨率为 512x715(1) 与 764×1146(2)。
| 模型组合 | 内存占用 | 推理时长(1) | 推理时长(2) |
|---|---|---|---|
| MODNet + mtcnn | 410MB | 0.207s | 0.246s |
| MODNet + retinaface | 405MB | 0.571s | 0.971s |
| birefnet-v1-lite + retinaface | 6.20GB | 7.063s | 7.128s |
在当前版本,可被英伟达GPU加速的模型为birefnet-v1-lite,并请确保你有16GB左右的显存。
如需使用英伟达GPU加速推理,在确保你已经安装CUDA与cuDNN后,根据onnxruntime-gpu文档找到对应的onnxruntime-gpu版本安装,以及根据pytorch官网找到对应的torch版本安装。
# 假如你的电脑安装的是CUDA 12.x, cuDNN 8
# 安装torch是可选的,如果你始终配置不好cuDNN,那么试试安装torch
pip install onnxruntime-gpu==1.18.0
pip install torch --index-url https://download.pytorch.org/whl/cu121完成安装后,调用birefnet-v1-lite模型即可利用GPU加速推理。
TIPS: CUDA 支持向下兼容。比如你的 CUDA 版本为 12.6,
torch官方目前支持的最高版本为 12.4(<12.6),torch仍可以正常使用CUDA。
python app.py运行程序将生成一个本地 Web 页面,在页面中可完成证件照的操作与交互。
核心参数:
-
-i: 输入图像路径 -
-o: 保存图像路径 -
-t: 推理类型,有idphoto、human_matting、add_background、generate_layout_photos可选 -
--matting_model: 人像抠图模型权重选择 -
--face_detect_model: 人脸检测模型选择
更多参数可通过python inference.py --help查看
输入 1 张照片,获得 1 张标准证件照和 1 张高清证件照的 4 通道透明 png
python inference.py -i demo/images/test0.jpg -o ./idphoto.png --height 413 --width 295输入 1 张照片,获得 1张 4 通道透明 png
python inference.py -t human_matting -i demo/images/test0.jpg -o ./idphoto_matting.png --matting_model hivision_modnet输入 1 张 4 通道透明 png,获得 1 张增加了底色的 3通道图像
python inference.py -t add_background -i ./idphoto.png -o ./idphoto_ab.jpg -c 4f83ce -k 30 -r 1输入 1 张 3 通道照片,获得 1 张六寸排版照
python inference.py -t generate_layout_photos -i ./idphoto_ab.jpg -o ./idphoto_layout.jpg --height 413 --width 295 -k 200输入 1 张 4 通道照片(抠图好的图像),获得 1 张标准证件照和 1 张高清证件照的 4 通道透明 png
python inference.py -t idphoto_crop -i ./idphoto_matting.png -o ./idphoto_crop.png --height 413 --width 295python deploy_api.py
详细请求方式请参考 API 文档,包含以下请求示例:
以下方式三选一
方式一:拉取最新镜像:
docker pull linzeyi/hivision_idphotos方式二:Dockrfile 直接构建镜像:
在确保将至少一个抠图模型权重文件放到hivision/creator/weights下后,在项目根目录执行:
docker build -t linzeyi/hivision_idphotos .方式三:Docker compose 构建:
在确保将至少一个抠图模型权重文件放到hivision/creator/weights下后,在项目根目录下执行:
docker compose build启动 Gradio Demo 服务
运行下面的命令,在你的本地访问 http://127.0.0.1:7860 即可使用。
docker run -d -p 7860:7860 linzeyi/hivision_idphotos启动 API 后端服务
docker run -d -p 8080:8080 linzeyi/hivision_idphotos python3 deploy_api.py两个服务同时启动
docker compose up -d本项目提供了一些额外的配置项,使用环境变量进行设置:
| 环境变量 | 类型 | 描述 | 示例 |
|---|---|---|---|
| FACE_PLUS_API_KEY | 可选 | 这是你在 Face++ 控制台申请的 API 密钥 | 7-fZStDJ···· |
| FACE_PLUS_API_SECRET | 可选 | Face++ API密钥对应的Secret | VTee824E···· |
| RUN_MODE | 可选 | 运行模式,可选值为beast(野兽模式)。野兽模式下人脸检测和抠图模型将不释放内存,从而获得更快的二次推理速度。建议内存16GB以上尝试。 |
beast |
| DEFAULT_LANG | 可选 | Gradio Demo启动时的默认语言 | en |
docker使用环境变量示例:
docker run -d -p 7860:7860 \
-e FACE_PLUS_API_KEY=7-fZStDJ···· \
-e FACE_PLUS_API_SECRET=VTee824E···· \
-e RUN_MODE=beast \
-e DEFAULT_LANG=en \
linzeyi/hivision_idphotos - 尺寸:修改size_list_CN.csv后再次运行
app.py即可,其中第一列为尺寸名,第二列为高度,第三列为宽度。 - 颜色:修改color_list_CN.csv后再次运行
app.py即可,其中第一列为颜色名,第二列为Hex值。
- 将字体文件放到
hivision/plugin/font文件夹下 - 修改
hivision/plugin/watermark.py的font_file参数值为字体文件名
- 将模板图片放到
hivision/plugin/template/assets文件夹下。模板图片是一个4通道的透明png。 - 在
hivision/plugin/template/assets/template_config.json文件中添加最新的模板信息,其中width为模板图宽度(px),height为模板图高度(px),anchor_points为模板中透明区域的四个角的坐标(px);rotation为透明区域相对于垂直方向的旋转角度,>0为逆时针,<0为顺时针。 - 在
demo/processor.py的_generate_image_template函数中的TEMPLATE_NAME_LIST变量添加最新的模板名
- 修改
demo/assets/title.md
如果您有任何问题,请发邮件至 [email protected]
贡献者们:
Zeyi-Lin、SAKURA-CAT、Feudalman、swpfY、Kaikaikaifang、ShaohonChen、KashiwaByte
This repository is licensed under the Apache-2.0 License.
如果您在研究或项目中使用了HivisionIDPhotos,请考虑引用我们的工作。您可以使用以下BibTeX条目:
@misc{hivisionidphotos,
title={{HivisionIDPhotos: A Lightweight and Efficient AI ID Photos Tool}},
author={Zeyi Lin and SwanLab Team},
year={2024},
publisher={GitHub},
url = {\url{https://github.com/Zeyi-Lin/HivisionIDPhotos}},
}For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for HivisionIDPhotos
Similar Open Source Tools
HivisionIDPhotos
HivisionIDPhoto is a practical algorithm for intelligent ID photo creation. It utilizes a comprehensive model workflow to recognize, cut out, and generate ID photos for various user photo scenarios. The tool offers lightweight cutting, standard ID photo generation based on different size specifications, six-inch layout photo generation, beauty enhancement (waiting), and intelligent outfit swapping (waiting). It aims to solve emergency ID photo creation issues.
k8m
k8m is an AI-driven Mini Kubernetes AI Dashboard lightweight console tool designed to simplify cluster management. It is built on AMIS and uses 'kom' as the Kubernetes API client. k8m has built-in Qwen2.5-Coder-7B model interaction capabilities and supports integration with your own private large models. Its key features include miniaturized design for easy deployment, user-friendly interface for intuitive operation, efficient performance with backend in Golang and frontend based on Baidu AMIS, pod file management for browsing, editing, uploading, downloading, and deleting files, pod runtime management for real-time log viewing, log downloading, and executing shell commands within pods, CRD management for automatic discovery and management of CRD resources, and intelligent translation and diagnosis based on ChatGPT for YAML property translation, Describe information interpretation, AI log diagnosis, and command recommendations, providing intelligent support for managing k8s. It is cross-platform compatible with Linux, macOS, and Windows, supporting multiple architectures like x86 and ARM for seamless operation. k8m's design philosophy is 'AI-driven, lightweight and efficient, simplifying complexity,' helping developers and operators quickly get started and easily manage Kubernetes clusters.
kirara-ai
Kirara AI is a chatbot that supports mainstream large language models and chat platforms. It provides features such as image sending, keyword-triggered replies, multi-account support, personality settings, and support for various chat platforms like QQ, Telegram, Discord, and WeChat. The tool also supports HTTP server for Web API, popular large models like OpenAI and DeepSeek, plugin mechanism, conditional triggers, admin commands, drawing models, voice replies, multi-turn conversations, cross-platform message sending, custom workflows, web management interface, and built-in Frpc intranet penetration.
LLM-TPU
LLM-TPU project aims to deploy various open-source generative AI models on the BM1684X chip, with a focus on LLM. Models are converted to bmodel using TPU-MLIR compiler and deployed to PCIe or SoC environments using C++ code. The project has deployed various open-source models such as Baichuan2-7B, ChatGLM3-6B, CodeFuse-7B, DeepSeek-6.7B, Falcon-40B, Phi-3-mini-4k, Qwen-7B, Qwen-14B, Qwen-72B, Qwen1.5-0.5B, Qwen1.5-1.8B, Llama2-7B, Llama2-13B, LWM-Text-Chat, Mistral-7B-Instruct, Stable Diffusion, Stable Diffusion XL, WizardCoder-15B, Yi-6B-chat, Yi-34B-chat. Detailed model deployment information can be found in the 'models' subdirectory of the project. For demonstrations, users can follow the 'Quick Start' section. For inquiries about the chip, users can contact SOPHGO via the official website.
md
The WeChat Markdown editor automatically renders Markdown documents as WeChat articles, eliminating the need to worry about WeChat content layout! As long as you know basic Markdown syntax (now with AI, you don't even need to know Markdown), you can create a simple and elegant WeChat article. The editor supports all basic Markdown syntax, mathematical formulas, rendering of Mermaid charts, GFM warning blocks, PlantUML rendering support, ruby annotation extension support, rich code block highlighting themes, custom theme colors and CSS styles, multiple image upload functionality with customizable configuration of image hosting services, convenient file import/export functionality, built-in local content management with automatic draft saving, integration of mainstream AI models (such as DeepSeek, OpenAI, Tongyi Qianwen, Tencent Hanyuan, Volcano Ark, etc.) to assist content creation.
Langchain-Chatchat
LangChain-Chatchat is an open-source, offline-deployable retrieval-enhanced generation (RAG) large model knowledge base project based on large language models such as ChatGLM and application frameworks such as Langchain. It aims to establish a knowledge base Q&A solution that is friendly to Chinese scenarios, supports open-source models, and can run offline.
TigerBot
TigerBot is a cutting-edge foundation for your very own LLM, providing a world-class large model for innovative Chinese-style contributions. It offers various upgrades and features, such as search mode enhancements, support for large context lengths, and the ability to play text-based games. TigerBot is suitable for prompt-based game engine development, interactive game design, and real-time feedback for playable games.
video-subtitle-remover
Video-subtitle-remover (VSR) is a software based on AI technology that removes hard subtitles from videos. It achieves the following functions: - Lossless resolution: Remove hard subtitles from videos, generate files with subtitles removed - Fill the region of removed subtitles using a powerful AI algorithm model (non-adjacent pixel filling and mosaic removal) - Support custom subtitle positions, only remove subtitles in defined positions (input position) - Support automatic removal of all text in the entire video (no input position required) - Support batch removal of watermark text from multiple images.
Element-Plus-X
Element-Plus-X is an out-of-the-box enterprise-level AI component library based on Vue 3 + Element-Plus. It features built-in scenario components such as chatbots and voice interactions, seamless integration with zero configuration based on Element-Plus design system, and support for on-demand loading with Tree Shaking optimization.
TelegramForwarder
Telegram Forwarder is a message forwarding tool that allows you to forward messages from specified chats to other chats without the need for a bot to enter the corresponding channels/groups to listen. It can be used for information stream integration filtering, message reminders, content archiving, and more. The tool supports multiple sources forwarding, keyword filtering in whitelist and blacklist modes, regular expression matching, message content modification, AI processing using major vendors' AI interfaces, media file filtering, and synchronization with a universal forum blocking plugin to achieve three-end blocking.
torch-rechub
Torch-RecHub is a lightweight, efficient, and user-friendly PyTorch recommendation system framework. It provides easy-to-use solutions for industrial-level recommendation systems, with features such as generative recommendation models, modular design for adding new models and datasets, PyTorch-based implementation for GPU acceleration, a rich library of 30+ classic and cutting-edge recommendation algorithms, standardized data loading, training, and evaluation processes, easy configuration through files or command-line parameters, reproducibility of experimental results, ONNX model export for production deployment, cross-engine data processing with PySpark support, and experiment visualization and tracking with integrated tools like WandB, SwanLab, and TensorBoardX.
Chinese-Mixtral-8x7B
Chinese-Mixtral-8x7B is an open-source project based on Mistral's Mixtral-8x7B model for incremental pre-training of Chinese vocabulary, aiming to advance research on MoE models in the Chinese natural language processing community. The expanded vocabulary significantly improves the model's encoding and decoding efficiency for Chinese, and the model is pre-trained incrementally on a large-scale open-source corpus, enabling it with powerful Chinese generation and comprehension capabilities. The project includes a large model with expanded Chinese vocabulary and incremental pre-training code.
SwanLab
SwanLab is an open-source, lightweight AI experiment tracking tool that provides a platform for tracking, comparing, and collaborating on experiments, aiming to accelerate the research and development efficiency of AI teams by 100 times. It offers a friendly API and a beautiful interface, combining hyperparameter tracking, metric recording, online collaboration, experiment link sharing, real-time message notifications, and more. With SwanLab, researchers can document their training experiences, seamlessly communicate and collaborate with collaborators, and machine learning engineers can develop models for production faster.
SakuraLLM
SakuraLLM is a project focused on building large language models for Japanese to Chinese translation in the light novel and galgame domain. The models are based on open-source large models and are pre-trained and fine-tuned on general Japanese corpora and specific domains. The project aims to provide high-performance language models for galgame/light novel translation that are comparable to GPT3.5 and can be used offline. It also offers an API backend for running the models, compatible with the OpenAI API format. The project is experimental, with version 0.9 showing improvements in style, fluency, and accuracy over GPT-3.5.
nndeploy
nndeploy is a tool that allows you to quickly build your visual AI workflow without the need for frontend technology. It provides ready-to-use algorithm nodes for non-AI programmers, including large language models, Stable Diffusion, object detection, image segmentation, etc. The workflow can be exported as a JSON configuration file, supporting Python/C++ API for direct loading and running, deployment on cloud servers, desktops, mobile devices, edge devices, and more. The framework includes mainstream high-performance inference engines and deep optimization strategies to help you transform your workflow into enterprise-level production applications.
MEGREZ
MEGREZ is a modern and elegant open-source high-performance computing platform that efficiently manages GPU resources. It allows for easy container instance creation, supports multiple nodes/multiple GPUs, modern UI environment isolation, customizable performance configurations, and user data isolation. The platform also comes with pre-installed deep learning environments, supports multiple users, features a VSCode web version, resource performance monitoring dashboard, and Jupyter Notebook support.
For similar tasks
HivisionIDPhotos
HivisionIDPhoto is a practical algorithm for intelligent ID photo creation. It utilizes a comprehensive model workflow to recognize, cut out, and generate ID photos for various user photo scenarios. The tool offers lightweight cutting, standard ID photo generation based on different size specifications, six-inch layout photo generation, beauty enhancement (waiting), and intelligent outfit swapping (waiting). It aims to solve emergency ID photo creation issues.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.




