Chenyme-AAVT

这是一个全自动（音频）视频翻译项目。利用Whisper识别声音，AI大模型翻译字幕，最后合并字幕视频，生成翻译后的视频。

Stars: 1189

Visit

Chenyme-AAVT is a user-friendly tool that provides automatic video and audio recognition and translation. It leverages the capabilities of Whisper, a powerful speech recognition model, to accurately identify speech in videos and audios. The recognized speech is then translated using ChatGPT or KIMI, ensuring high-quality translations. With Chenyme-AAVT, you can quickly generate字幕 files and merge them with the original video, making video translation a breeze. The tool supports various languages, allowing you to translate videos and audios into your desired language. Additionally, Chenyme-AAVT offers features such as VAD (Voice Activity Detection) to enhance recognition accuracy, GPU acceleration for faster processing, and support for multiple字幕 formats. Whether you're a content creator, translator, or anyone looking to make video translation more efficient, Chenyme-AAVT is an invaluable tool.

README:

[!NOTE] 🌟 如果本项目对您有帮助，记得 Star 🌟 支持一下吧~

📝 推荐识别时使用 Large 模型以获取更好的体验！由于正在备考，更新速度会放缓，感谢理解！

📖 安装教程 | ❓ 常见问题 | 💬 电报群组

项目介绍

Chenyme-AAVT 全自动视频翻译项目 致力于提供一个简便高效且免费的媒体识别与翻译自动化流程，帮助您快速完成音视频字幕的识别、翻译和处理等多种功能，当然目前项目已经不仅仅是帮您识别并翻译声音，还可以自动化生成营销图文、对字幕单独翻译。计划未来会基于现有基本功能继续加入更多有意思的工具，比如实时识别、口型校正、声音克隆、音色辨别等等，敬请期待！

当前已支持的基本功能，非全部功能：

【音频识别】|【视频识别】|【图文博客】|【字幕翻译】|【声音模拟】

项目亮点

👉 TODO | 待办事项

识别相关

[x] 更换更快的Whisper项目
[x] 支持本地模型加载
[x] 支持个人微调Whisper模型
[x] VAD辅助优化
[x] 字词级断句优化
[x] 更多的语种识别
[ ] 音色辨别
[ ] 实时语音翻译

翻译相关

[x] 翻译优化
[x] 更多的语种翻译
[x] 更多的翻译模型
[x] 更多的翻译引擎
[x] 支持本地大语言模型翻译

视频相关

[x] 个性化字幕
[x] 更多字幕格式
[x] 字幕预览、实时修改
[ ] 自动化字幕文本校对
[ ] 双字幕
[ ] 视频中文配音
[ ] 声音克隆
[ ] 口型校对

图文博客

[x] 生成图文
[ ] 更多写作风格
[ ] 优化生成效率
[ ] 提高成品率

其他

[x] AI助手
[x] 视频预览

支持识别和翻译多种语言
支持 全流程本地化、免费化部署
支持对视频 一键生成博客内容、营销图文
支持 自动化翻译、二次修改字幕、预览视频
支持开启 GPU 加速、VAD 辅助、FFmpeg 加速
支持使用 ChatGPT、Claude、Gemini、DeepSeek 等众多大模型翻译引擎

[!WARNING]

关于 dll 缺失的公告

这些 dll 缺失多个依赖 CUDA 和 Pytorch 的项目均有此问题，希望相关官方尽快修复 ~

以下解决方法经过本人验证可有效解决，麻烦给颗🌟Star吧!

1. ❌ fbgemm.dll 缺失。此为pytorch对win的mkl文件构建错误，官方已在2.4.1 Beta版本中修正，请遇到后重新 Install.bat，并在菜单栏选择修复版本（2.4.1）修正

2. ❌ cudnn_ops_infer64_8.dll 缺失，导致启用GPU失败。请前往 github.com/Chenyme/Chenyme-AAVT/releases/tag/V0.9 中下载 CUDA_dll.zip 压缩包解压到CUDA目录 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin 修复

Windows 部署

👉 前置环境：Python、FFmpeg、CUDA 说明

Python | 📖 教程

💡 选择 Python > 3.8 的版本
前往 Python 官网下载安装程序
运行安装，在安装时请点击 ADD TO PATH 选项

FFMpeg | 📖 教程

💡 若您不知道如何安装编译，请直接在项目Release中的下载 Win 版本，自带编译后的FFMpeg
前往 FFMpeg 官网下载编译好的 Windows 版本
设置 FFmpeg 为环境变量

CUDA(CPU 可忽略) | 📖 教程

💡 推荐使用版本为 CUDA11.8、12.1、12.4
前往 CUDA 官网下载 CUDA 安装程序
安装 CUDA

‼️ 请确保前置环境已准备好后再继续下面的步骤‼️

1. 运行部署脚本

前往 Release 页面下载 Win 的最新发行版（Win/Small）

运行 1_Install.bat，等待脚本检查

通过后根据界面内提示选择版本安装

2. 运行项目Web

运行 2_WebUI.bat

输入 chenymeaavt 进入项目（此为新版本的保护功能，可关闭）

ℹ️ WebUI 会自动拉起，若没有自动跳转请手动在浏览器输入localhost:8501

Mac OS 部署

👉 前置环境：Python、Brew 说明

Python

💡 选择 Python > 3.8 的版本
前往 Python 官网下载 PGK 安装包
运行安装，页面内选择标准安装

Brew

💡 使用下面的命令进行一键安装安装 brew

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

‼️ 请确保前置环境已准备好后再继续下面的步骤‼️

1. 安装FFMpeg
brew install FFMpeg
2. 安装项目依赖

前往 Release 页面下载 Mac 的最新发行版（Mac/Small）

cd 到项目根路径
pip3 install -r requirements.txt
3. 运行项目Web
streamlit run Chenyme-AAVT.py
输入 chenymeaavt 进入项目（此为新版本的保护功能，可关闭）

ℹ️ WebUI 会自动拉起，若没有自动跳转请手动在浏览器输入localhost:8501

Linux 部署

感谢 @dhlsam 提供此版本

具体使用方法，请查阅：📖 issues/36

Google Colab 部署

感谢 @Kirie233 提供此版本

具体使用方法，请查阅：

Docker 部署

💡 目前项目最新版本为 V0.9.0 此 Docker 方法的版本为 V0.8.x，

感谢 @Eisaichen 提供此版本

具体使用方法，请查阅：📖 eisai/chenyme-aavt
docker pull eisai/chenyme-aavt

Star History

主页BOT

部分设置

音频识别

视频识别

图文博客

字幕翻译

声音模拟

For Tasks:

Click tags to check more tools for each tasks

translate videos transcribe audios generate subtitles

For Jobs:

video translator content creator language learner researcher journalist

Alternative AI tools for Chenyme-AAVT

Similar Open Source Tools

Chenyme-AAVT

github

: 1.2k

aituber-kit

AITuber-Kit is a tool that enables users to interact with AI characters, conduct AITuber live streams, and engage in external integration modes. Users can easily converse with AI characters using various LLM APIs, stream on YouTube with AI character reactions, and send messages to server apps via WebSocket. The tool provides settings for API keys, character configurations, voice synthesis engines, and more. It supports multiple languages and allows customization of VRM models and background images. AITuber-Kit follows the MIT license and offers guidelines for adding new languages to the project.

github

: 421

LxgwZhenKai

LxgwZhenKai is a Chinese font derived from LXGW WenKai, manually adjusted for boldness and supplemented with AI assistance for character additions. The font aims to provide a comfortable reading experience on screens while also serving as a bold version of LXGW WenKai for temporary use. It contains over 13,000 characters, including common simplified and traditional Chinese characters, and is licensed under SIL Open Font License 1.1. Users are allowed to freely use, distribute, modify, and create derivative fonts based on LxgwZhenKai.

github

: 220

WeChatMsg

WeChatMsg is a tool designed to help users manage and analyze their WeChat data. It aims to provide users with the ability to preserve their precious memories and create a personalized AI companion. The tool allows users to extract and export various types of data from WeChat, such as text, images, contacts, and more. Additionally, it offers features like analyzing chat data and generating visual annual reports. WeChatMsg is built on the idea of empowering users to take control of their data and foster emotional connections through technology.

github

: 38.4k

GoMaxAI-ChatGPT-Midjourney-Pro

GoMaxAI Pro is an AI-powered application for personal, team, and enterprise private operations. It supports various models like ChatGPT, Claude, Gemini, Kimi, Wenxin Yiyuan, Xunfei Xinghuo, Tsinghua Zhipu, Suno-v3.5, and Luma-video. The Pro version offers a new UI interface, member points system, management backend, homepage features, support for various content formats, AI video capabilities, SAAS multi-opening function, bug fixes, and more. It is built using web frontend with Vue3, mobile frontend with Uniapp, management frontend with Vue3, backend with Nodejs, and uses MySQL5.7(+) + Redis for data support. It can be deployed on Linux, Windows, or MacOS, with data storage options including local storage, Aliyun OSS, Tencent Cloud COS, and Chevereto image bed.

github

: 233

KubeDoor

KubeDoor is a microservice resource management platform developed using Python and Vue, based on K8S admission control mechanism. It supports unified remote storage, monitoring, alerting, notification, and display for multiple K8S clusters. The platform focuses on resource analysis and control during daily peak hours of microservices, ensuring consistency between resource request rate and actual usage rate.

github

: 272

uDesktopMascot

uDesktopMascot is an open-source project for a desktop mascot application with a theme of 'freedom of creation'. It allows users to load and display VRM or GLB/FBX model files on the desktop, customize GUI colors and background images, and access various features through a menu screen. The application supports Windows 10/11 and macOS platforms.

github

: 265

AiNiee

AiNiee is a tool focused on AI translation, capable of automatically translating RPG SLG games, Epub TXT novels, Srt Lrc subtitles, and more. It provides features for configuring AI platforms, proxies, and translation settings. Users can utilize this tool for translating game scripts, novels, and subtitles efficiently. The tool supports multiple AI platforms and offers tutorials for beginners. It also includes functionalities for extracting and translating game text, with options for customizing translation projects and managing translation tasks effectively.

github

: 2.2k

99AI

99AI is a commercializable AI web application based on NineAI 2.4.2 (no authorization, no backdoors, no piracy, integrated front-end and back-end integration packages, supports Docker rapid deployment). The uncompiled source code is temporarily closed. Compared with the stable version, the development version is faster.

github

: 736

SwanLab

SwanLab is an open-source, lightweight AI experiment tracking tool that provides a platform for tracking, comparing, and collaborating on experiments, aiming to accelerate the research and development efficiency of AI teams by 100 times. It offers a friendly API and a beautiful interface, combining hyperparameter tracking, metric recording, online collaboration, experiment link sharing, real-time message notifications, and more. With SwanLab, researchers can document their training experiences, seamlessly communicate and collaborate with collaborators, and machine learning engineers can develop models for production faster.

github

: 1.3k

Code-Review-GPT-Gitlab

A project that utilizes large models to help with Code Review on Gitlab, aimed at improving development efficiency. The project is customized for Gitlab and is developing a Multi-Agent plugin for collaborative review. It integrates various large models for code security issues and stays updated with the latest Code Review trends. The project architecture is designed to be powerful, flexible, and efficient, with easy integration of different models and high customization for developers.

github

: 452

easyAi

EasyAi is a lightweight, beginner-friendly Java artificial intelligence algorithm framework. It can be seamlessly integrated into Java projects with Maven, requiring no additional environment configuration or dependencies. The framework provides pre-packaged modules for image object detection and AI customer service, as well as various low-level algorithm tools for deep learning, machine learning, reinforcement learning, heuristic learning, and matrix operations. Developers can easily develop custom micro-models tailored to their business needs.

github

: 75

prompt-optimizer

Prompt Optimizer is a powerful AI prompt optimization tool that helps you write better AI prompts, improving AI output quality. It supports both web application and Chrome extension usage. The tool features intelligent optimization for prompt words, real-time testing to compare before and after optimization, integration with multiple mainstream AI models, client-side processing for security, encrypted local storage for data privacy, responsive design for user experience, and more.

github

: 1.6k

NGCBot

NGCBot is a WeChat bot based on the HOOK mechanism, supporting scheduled push of security news from FreeBuf, Xianzhi, Anquanke, and Qianxin Attack and Defense Community, KFC copywriting, filing query, phone number attribution query, WHOIS information query, constellation query, weather query, fishing calendar, Weibei threat intelligence query, beautiful videos, beautiful pictures, and help menu. It supports point functions, automatic pulling of people, ad detection, automatic mass sending, Ai replies, rich customization, and easy for beginners to use. The project is open-source and periodically maintained, with additional features such as Ai (Gpt, Xinghuo, Qianfan), keyword invitation to groups, automatic mass sending, and group welcome messages.

github

: 3.1k

chatgpt-plus

ChatGPT-PLUS is an open-source AI assistant solution based on AI large language model API, with a built-in operational management backend for easy deployment. It integrates multiple large language models from platforms like OpenAI, Azure, ChatGLM, Xunfei Xinghuo, and Wenxin Yanyan. Additionally, it includes MidJourney and Stable Diffusion AI drawing features. The system offers a complete open-source solution with ready-to-use frontend and backend applications, providing a seamless typing experience via Websocket. It comes with various pre-trained role applications such as Xiaohongshu writer, English translation master, Socrates, Confucius, Steve Jobs, and weekly report assistant to meet various chat and application needs. Users can enjoy features like Suno Wensheng music, integration with MidJourney/Stable Diffusion AI drawing, personal WeChat QR code for payment, built-in Alipay and WeChat payment functions, support for various membership packages and point card purchases, and plugin API integration for developing powerful plugins using large language model functions.

github

: 2.8k

AivisSpeech

AivisSpeech is a Japanese text-to-speech software based on the VOICEVOX editor UI. It incorporates the AivisSpeech Engine for generating emotionally rich voices easily. It supports AIVMX format voice synthesis model files and specific model architectures like Style-Bert-VITS2. Users can download AivisSpeech and AivisSpeech Engine for Windows and macOS PCs, with minimum memory requirements specified. The development follows the latest version of VOICEVOX, focusing on minimal modifications, rebranding only where necessary, and avoiding refactoring. The project does not update documentation, maintain test code, or refactor unused features to prevent conflicts with VOICEVOX.

github

: 325

For similar tasks

Chenyme-AAVT

github

: 1.2k

MoneyPrinterTurbo

MoneyPrinterTurbo is a tool that can automatically generate video content based on a provided theme or keyword. It can create video scripts, materials, subtitles, and background music, and then compile them into a high-definition short video. The tool features a web interface and an API interface, supporting AI-generated video scripts, customizable scripts, multiple HD video sizes, batch video generation, customizable video segment duration, multilingual video scripts, multiple voice synthesis options, subtitle generation with font customization, background music selection, access to high-definition and copyright-free video materials, and integration with various AI models like OpenAI, moonshot, Azure, and more. The tool aims to simplify the video creation process and offers future plans to enhance voice synthesis, add video transition effects, provide more video material sources, offer video length options, include free network proxies, enable real-time voice and music previews, support additional voice synthesis services, and facilitate automatic uploads to YouTube platform.

github

: 25.7k

Whisper-WebUI

Whisper-WebUI is a Gradio-based browser interface for Whisper, serving as an Easy Subtitle Generator. It supports generating subtitles from various sources such as files, YouTube, and microphone. The tool also offers speech-to-text and text-to-text translation features, utilizing Facebook NLLB models and DeepL API. Users can translate subtitle files from other languages to English and vice versa. The project integrates faster-whisper for improved VRAM usage and transcription speed, providing efficiency metrics for optimized whisper models. Additionally, users can choose from different Whisper models based on size and language requirements.

github

: 1.8k

FunClip

FunClip is an open-source, locally deployable automated video editing tool that utilizes the FunASR Paraformer series models from Alibaba DAMO Academy for speech recognition in videos. Users can select text segments or speakers from the recognition results and click the clip button to obtain the corresponding video segments. FunClip integrates advanced features such as the Paraformer-Large model for accurate Chinese ASR, SeACo-Paraformer for customized hotword recognition, CAM++ speaker recognition model, Gradio interactive interface for easy usage, support for multiple free edits with automatic SRT subtitles generation, and segment-specific SRT subtitles.

github

: 2.1k

openlrc

Open-Lyrics is a Python library that transcribes voice files using faster-whisper and translates/polishes the resulting text into `.lrc` files in the desired language using LLM, e.g. OpenAI-GPT, Anthropic-Claude. It offers well preprocessed audio to reduce hallucination and context-aware translation to improve translation quality. Users can install the library from PyPI or GitHub and follow the installation steps to set up the environment. The tool supports GUI usage and provides Python code examples for transcription and translation tasks. It also includes features like utilizing context and glossary for translation enhancement, pricing information for different models, and a list of todo tasks for future improvements.

github

: 476

FunClip

FunClip is an open-source, locally deployed automated video clipping tool that leverages Alibaba TONGYI speech lab's FunASR Paraformer series models for speech recognition on videos. Users can select text segments or speakers from recognition results to obtain corresponding video clips. It integrates industrial-grade models for accurate predictions and offers hotword customization and speaker recognition features. The tool is user-friendly with Gradio interaction, supporting multi-segment clipping and providing full video and target segment subtitles. FunClip is suitable for users looking to automate video clipping tasks with advanced AI capabilities.

github

: 3.1k

decipher

Decipher is a tool that utilizes AI-generated transcription subtitles to automatically add subtitles to videos. It eliminates the need for manual transcription, making videos more accessible. The tool uses OpenAI's Whisper, a State-of-the-Art speech recognition system trained on a large dataset for improved robustness to accents, background noise, and technical language.

github

: 519

AI-Translation-Assistant-Pro

AI Translation Assistant Pro is a powerful AI-driven platform for multilingual translation and content processing. It offers features such as text translation, image recognition, PDF processing, speech recognition, and video processing. The platform includes a subscription system with different membership levels, user management functionalities, quota management, and real-time usage statistics. It utilizes technologies like Next.js, React, TypeScript for the frontend, Node.js, PostgreSQL for the backend, NextAuth.js for authentication, Stripe for payments, and integrates with cloud services like Aliyun OSS and Tencent Cloud for AI services.

github

: 145

For similar jobs

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

daily-poetry-image

Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.

github

: 492

exif-photo-blog

EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.

github

: 992

SillyTavern

SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.

github

: 13.2k

Twitter-Insight-LLM

This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).

github

: 401

AISuperDomain

Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.

github

: 1.2k

ChatGPT-On-CS

This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.

github

: 768

obs-localvocal

LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.

github

: 248

Chenyme-AAVT

README:

项目介绍

项目亮点

识别相关

翻译相关

视频相关

图文博客

其他

关于 dll 缺失的公告

1. ❌ fbgemm.dll 缺失。此为pytorch对win的mkl文件构建错误，官方已在2.4.1 Beta版本中修正，请遇到后重新 Install.bat，并在菜单栏选择 修复版本（2.4.1）修正

2. ❌ cudnn_ops_infer64_8.dll 缺失，导致启用GPU失败。请前往 github.com/Chenyme/Chenyme-AAVT/releases/tag/V0.9 中下载 CUDA_dll.zip 压缩包解压到CUDA目录 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin 修复

Windows 部署

Python | 📖 教程

FFMpeg | 📖 教程

CUDA(CPU 可忽略) | 📖 教程

1. 运行部署脚本

2. 运行项目Web

Mac OS 部署

Python

Brew

1. 安装FFMpeg

2. 安装项目依赖

3. 运行项目Web

Linux 部署

Google Colab 部署

Docker 部署

Star History

主页BOT

部分设置

音频识别

视频识别

图文博客

字幕翻译

声音模拟

For Tasks:

For Jobs:

Alternative AI tools for Chenyme-AAVT

Similar Open Source Tools

Chenyme-AAVT

aituber-kit

LxgwZhenKai

WeChatMsg

GoMaxAI-ChatGPT-Midjourney-Pro

KubeDoor

uDesktopMascot

AiNiee

99AI

SwanLab

Code-Review-GPT-Gitlab

easyAi

prompt-optimizer

NGCBot

chatgpt-plus

AivisSpeech

For similar tasks

Chenyme-AAVT

MoneyPrinterTurbo

Whisper-WebUI

FunClip

openlrc

FunClip

decipher

AI-Translation-Assistant-Pro

For similar jobs

LLMStack

daily-poetry-image

exif-photo-blog

SillyTavern

Twitter-Insight-LLM

AISuperDomain

ChatGPT-On-CS

obs-localvocal

1. ❌ `fbgemm.dll` 缺失。此为pytorch对win的mkl文件构建错误，官方已在2.4.1 Beta版本中修正，请遇到后重新 Install.bat，并在菜单栏选择修复版本（2.4.1）修正

2. ❌ `cudnn_ops_infer64_8.dll` 缺失，导致启用GPU失败。请前往 github.com/Chenyme/Chenyme-AAVT/releases/tag/V0.9 中下载 CUDA_dll.zip 压缩包解压到CUDA目录 `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin` 修复