EasyAIVtuber

Simply animate your 2D waifu.

Stars: 51

Visit

EasyAIVtuber is a tool designed to animate 2D waifus by providing features like automatic idle actions, speaking animations, head nodding, singing animations, and sleeping mode. It also offers API endpoints and a web UI for interaction. The tool requires dependencies like torch and pre-trained models for optimal performance. Users can easily test the tool using OBS and UnityCapture, with options to customize character input, output size, simplification level, webcam output, model selection, port configuration, sleep interval, and movement extension. The tool also provides an API using Flask for actions like speaking based on audio, rhythmic movements, singing based on music and voice, stopping current actions, and changing images.

README:

EasyAIVtuber

驱动你的纸片人老婆。 Simply animate your 2D waifu.

Fork自 yuyuyzl/EasyVtuber。由于是AI Vtuber，因此删减了原项目的面捕功能。本项目配合stable diffusion等文生图模型为最佳食用方式。喜欢请点个星星哦~

视频教程：制作中...0.0

Features not available in the original repo

空闲自动做动作（眨眼、东张西望）
说话动作（自动对口型）
摇子（自动随节奏点头）
唱歌动作（自动对口型，跟随节奏摇摆）
睡大觉（使用--sleep参数控制入睡间隔）
API调用接口
webui方式调用

Installation

安装依赖库

创建并激活虚拟环境

conda create -n eaiv python=3.10
conda activate eaiv

安装torch（最好是30系显卡及以上）

# CUDA 11.8
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118

然后在项目目录下执行以下命令

pip install -r requirements.txt

下载预训练模型

原模型文件地址：https://www.dropbox.com/s/y7b8jl4n2euv8xe/talking-head-anime-3-models.zip?dl=0
下载后解压到data/models文件夹中，与placeholder.txt同级

如果不想下载所有权重（四个版本），也可以在huggingface上下载：https://huggingface.co/ksuriuri/talking-head-anime-3-models

正确的目录层级为

+ models
  - separable_float
  - separable_half
  - standard_float
  - standard_half
  - placeholder.txt

安装OBS

可在网上自行搜索教程安装

安装UnityCapture

注：如果电脑上安装过VTube Studio，也许OBS的视频采集设备的设备中就会有 VTubeStudioCam（没做过实验不太确定）。若有此设备，便无需执行下面步骤安装UnityCapture，直接使用 VTubeStudioCam 即可

为了能够在OBS上看到纸片老婆并且使用透明背景，需要安装UnityCapture
参考 https://github.com/schellingb/UnityCapture#installation
只需要正常走完Install.bat，在OBS的视频采集设备中便能看到对应的设备（Unity Video Capture）。

如何使背景透明

在OBS添加完视频采集设备以后，右键视频采集设备-设置-取消激活-分辨率类型选自定义-分辨率512x512(与--output_size参数一致)-视频格式选ARGB-激活

Usage

快速测试

打开OBS，添加视频采集设备并按要求（安装UnityCapture）进行配置
将main.bat中第一行的虚拟环境的路径修改为你自己的虚拟环境路径
运行main.bat，等待初始化完毕，如配置无误，这时OBS中便能够看到人物在动
二选一
1. 简单测试：运行test.py
2. 运行webui：将webui.bat中第一行的虚拟环境的路径修改为你自己的虚拟环境路径，然后运行webui.bat

具体使用可参考 API Details

启动参数

参数名	类型	说明
--character	str	`data/images`目录下的输入图像文件名，不需要带扩展名
--output_size	str	格式为`512x512`，必须是4的倍数。增大它并不会让图像更清晰，但配合extend_movement会增大可动范围
--simplify	int	可用值为`1` `2` `3` `4`，值越大CPU运算量越小，但动作精度越低
--output_webcam	str	可用值为`unitycapture`，选择对应的输出种类，不传不输出到摄像头
--model	str	可用值为`standard_float` `standard_half` `separable_float` `separable_half`，显存占用不同，选择合适的即可
--port	int	本地API的端口号，默认为7888，若7888被占用则需要更改
--sleep	int	入睡间隔，默认为20，空闲状态下20秒后会睡大觉，设置为-1即可不进入睡觉状态
--extend_movement	float	暂时没有用）根据头部位置，对模型输出图像进一步进行移动和旋转使得上半身可动传入的数值表示移动倍率（建议值为1）

API Details

API使用Flask来开发，默认运行在 http://127.0.0.1:7888 （默认端口为7888），可在main.bat的--port中修改端口号。使用post请求 http://127.0.0.1:7888/alive ，并传入参数即可做出对应动作，具体示例可参考test.py。

根据音频说话

REQUEST

{
  "type": "speak",
  "speech_path": "your speech path"
}

在"speech_path"中填写你的语音音频路径，支持wav, mp3, flac等格式（pygame支持的格式）

RESPONSE

{
  "status": "success"
}

根据音乐节奏摇

REQUEST

{
  "type": "rhythm",
  "music_path": "your music path",
  "beat": 2
}

在"music_path"中填写你的音频路径，支持wav, mp3, flac等格式（pygame支持的格式）。
"beat"（可选）：取值为 1 2 4，控制节拍，默认为2

RESPONSE

{
  "status": "success"
}

根据音乐和人声唱歌

REQUEST

{
  "type": "sing",
  "music_path": "your music path",
  "voice_path": "your voice path",
  "mouth_offset": 0.0,
  "beat": 2
}

口型驱动的原理是根据音量大小来控制嘴巴的大小，因此需要事先将人声提取出来以更精准地控制口型。假设你有一首歌，路径为path/music.wav，利用UVR5等工具分离出人声音频path/voice.wav，然后将path/music.wav填入"music_path"，将path/voice.wav填入"voice_path"，支持wav, mp3, flac等格式（pygame支持的格式）。
"mouth_offset"（可选）：取值区间为 [0, 1]，默认为0，如果角色唱歌时的嘴张的不够大，可以试试将这个值设大
"beat"（可选）：取值为1 2 4，默认为2，控制节拍

RESPONSE

{
  "status": "success"
}

停止当前动作

REQUEST

{
  "type": "stop"
}

RESPONSE

{
  "status": "success"
}

更换当前图片

REQUEST

{
  "type": "change_img", 
  "img": "your image path"
}

在"img"中填写图片路径，图片大小最好是512x512，png格式

RESPONSE

{
  "status": "success"
}

For Tasks:

Click tags to check more tools for each tasks

animate character speak based on audio sing based on music create idle animations change character images

For Jobs:

animator content creator game developer ai developer web developer

Alternative AI tools for EasyAIVtuber

Similar Open Source Tools

EasyAIVtuber

github

: 51

ai-wechat-bot

Gewechat is a project based on the Gewechat project to implement a personal WeChat channel, using the iPad protocol for login. It can obtain wxid and send voice messages, which is more stable than the itchat protocol. The project provides documentation for the API. Users can deploy the Gewechat service and use the ai-wechat-bot project to interface with it. Configuration parameters for Gewechat and ai-wechat-bot need to be set in the config.json file. Gewechat supports sending voice messages, with limitations on the duration of received voice messages. The project has restrictions such as requiring the server to be in the same province as the device logging into WeChat, limited file download support, and support only for text and image messages.

github

: 366

emohaa-free-api

Emohaa AI Free API is a free API that allows you to access the Emohaa AI chatbot. Emohaa AI is a powerful chatbot that can understand and respond to a wide range of natural language queries. It can be used for a variety of purposes, such as customer service, information retrieval, and language translation. The Emohaa AI Free API is easy to use and can be integrated into any application. It is a great way to add AI capabilities to your projects without having to build your own chatbot from scratch.

github

: 77

Chat-Style-Bot

Chat-Style-Bot is an intelligent chatbot designed to mimic the chatting style of a specified individual. By analyzing and learning from WeChat chat records, Chat-Style-Bot can imitate your unique chatting style and become your personal chat assistant. Whether it's communicating with friends or handling daily conversations, Chat-Style-Bot can provide a natural, personalized interactive experience.

github

: 68

mcp

github

: 58

GitHubSentinel

GitHub Sentinel is an intelligent information retrieval and high-value content mining AI Agent designed for the era of large models (LLMs). It is aimed at users who need frequent and large-scale information retrieval, especially open source enthusiasts, individual developers, and investors. The main features include subscription management, update retrieval, notification system, report generation, multi-model support, scheduled tasks, graphical interface, containerization, continuous integration, and the ability to track and analyze the latest dynamics of GitHub open source projects and expand to other information channels like Hacker News for comprehensive information mining and analysis capabilities.

github

: 61

lumen

Lumen is a command-line tool that leverages AI to enhance your git workflow. It assists in generating commit messages, understanding changes, interactive searching, and analyzing impacts without the need for an API key. With smart commit messages, git history insights, interactive search, change analysis, and rich markdown output, Lumen offers a seamless and flexible experience for users across various git workflows.

github

: 681

Groq2API

Groq2API is a REST API wrapper around the Groq2 model, a large language model trained by Google. The API allows you to send text prompts to the model and receive generated text responses. The API is easy to use and can be integrated into a variety of applications.

github

: 288

Webscout

github

: 210

fastllm

FastLLM is a high-performance large model inference library implemented in pure C++ with no third-party dependencies. Models of 6-7B size can run smoothly on Android devices. Deployment communication QQ group: 831641348

github

: 3.4k

Webscout

WebScout is a versatile tool that allows users to search for anything using Google, DuckDuckGo, and phind.com. It contains AI models, can transcribe YouTube videos, generate temporary email and phone numbers, has TTS support, webai (terminal GPT and open interpreter), and offline LLMs. It also supports features like weather forecasting, YT video downloading, temp mail and number generation, text-to-speech, advanced web searches, and more.

github

: 203

Gmail-MCP-Server

github

: 126

supergateway

Supergateway is a tool that allows running MCP stdio-based servers over SSE (Server-Sent Events) with one command. It is useful for remote access, debugging, or connecting to SSE-based clients when your MCP server only speaks stdio. The tool supports running in SSE to Stdio mode as well, where it connects to a remote SSE server and exposes a local stdio interface for downstream clients. Supergateway can be used with ngrok to share local MCP servers with remote clients and can also be run in a Docker containerized deployment. It is designed with modularity in mind, ensuring compatibility and ease of use for AI tools exchanging data.

github

: 599

step-free-api

The StepChat Free service provides high-speed streaming output, multi-turn dialogue support, online search support, long document interpretation, and image parsing. It offers zero-configuration deployment, multi-token support, and automatic session trace cleaning. It is fully compatible with the ChatGPT interface. Additionally, it provides seven other free APIs for various services. The repository includes a disclaimer about using reverse APIs and encourages users to avoid commercial use to prevent service pressure on the official platform. It offers online testing links, showcases different demos, and provides deployment guides for Docker, Docker-compose, Render, Vercel, and native deployments. The repository also includes information on using multiple accounts, optimizing Nginx reverse proxy, and checking the liveliness of refresh tokens.

github

: 132

omniai

OmniAI provides a unified Ruby API for integrating with multiple AI providers, streamlining AI development by offering a consistent interface for features such as chat, text-to-speech, speech-to-text, and embeddings. It ensures seamless interoperability across platforms and effortless switching between providers, making integrations more flexible and reliable.

github

: 161

gemini-openai-proxy

Gemini-OpenAI-Proxy is a proxy software designed to convert OpenAI API protocol calls into Google Gemini Pro protocol, allowing software using OpenAI protocol to utilize Gemini Pro models seamlessly. It provides an easy integration of Gemini Pro's powerful features without the need for complex development work.

github

: 264

For similar tasks

EasyAIVtuber

github

: 51

For similar jobs

promptflow

**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

github

: 9.2k

deepeval

DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

github

: 5.8k

MegaDetector

MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".

github

: 106

leapfrogai

LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.

github

: 255

llava-docker

This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

github

: 59

carrot

The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.

github

: 17.1k

TrustLLM

TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

github

: 535

AI-YinMei

AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.

github

: 529