Awesome-ChatTTS

官方推荐的 ChatTTS 资源汇总项目，整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project

Stars: 594

Visit

Awesome-ChatTTS is an official recommended guide for ChatTTS beginners, compiling common questions and related resources. It provides a comprehensive overview of the project, including official introduction, quick experience options, popular branches, parameter explanations, voice seed details, installation guides, FAQs, and error troubleshooting. The repository also includes video tutorials, discussion community links, and project trends analysis. Users can explore various branches for different functionalities and enhancements related to ChatTTS.

README:

English | 简体中文

Awesome-ChatTTS 是官方推荐的 ChatTTS 资源汇总项目，欢迎在 issues 中推荐或者自荐。

如果觉得本项目对你了解和使用 ChatTTS 有帮助，还请打赏个 ⭐️ 支持一下。

[!NOTE] 以下项目均为社区资源，查看官方信息请到源仓库 2noise/ChatTTS 。

官方简介
快速体验
热门分支
界面说明
音色控制
入门教程
常见问题
报错速查

官方简介

https://github.com/libukai/Awesome-ChatTTS/assets/5654585/532bfb80-316a-4244-9b92-301c732e8b63

快速体验

网址	类型
Original Web	原版网页版体验
Forge Web	Forge 增强版体验
Linux	Python 安装包
Samples	音色种子示例
Cloning	音色克隆体验

项目	Star	亮点
jianchang512/ChatTTS-ui		提供 API 接口，可在第三方应用中调用
6drf21e/ChatTTS_colab		提供流式输出，支持长音频生成和分角色阅读
lenML/ChatTTS-Forge		提供人声增强和背景降噪，可使用附加提示词
CCmahua/ChatTTS-Enhanced		支持文件批量处理，以及导出 SRT 文件
HKoon/ChatTTS-OpenVoice		配合 OpenVoice 进行声音克隆

项目	Star	亮点
6drf21e/ChatTTS_Speaker		音色角色打标与稳定性评估
AIFSH/ComfyUI-ChatTTS		ComfyUi 版本，可作为工作流节点引入
MaterialShadow/ChatTTS-manager		提供了音色管理系统和 WebUI 界面

界面说明

文本控制

1. Input Text : 需要转换的文本，支持中文和英文混杂
2. Refine text : 是否对文本进行口语化处理
3. Text Seed : 配置文本种子值，不同种子对应不同口语化风格
4. 🎲 : 随机产生文本种子值
5. Output Text : 口语化处理后生成的文本

音色控制

6. Timbre : 预设的音色种子值
7. Audio Seed : 配置音色种子值，不同种子对应不同音色
8. 🎲 : 随机产生音色种子值
9. Speaker Embedding : 音色码，详见音色控制

情感控制

10. temperate : 控制音频情感波动性，范围为 0-1，数字越大，波动性越大
11. top_P ：控制音频的情感相关性，范围为 0.1-0.9，数字越大，相关性越高
12. top_K ：控制音频的情感相似性，范围为 1-20，数字越小，相似性越高

系数控制

13. DVAE Coefficient : 模型系数码
14. Reload : 重新加载模型系数

播放控制

15. Auto Play : 是否在生成音频后自动播放
16. Stream Mode : 是否启用流式输出
17. Generate : 点击生成音频文件
18. Output Audio : 音频生成结果
19. ↓ : 点击下载音频文件
20. ▶️ : 点击播放音频文件

示例控制

21. Example : 点击切换示例配置

音色控制

经过实际测试，指定音色种子值每次生成 spk_emb 和重复使用预生成好的 spk_emb 效果有较显著差异，建议优先使用 .pt 音色文件或者音色码（字符串表示形式）。

在 ChatTTS_Speaker 项目中对音色种子进行了初步打标和稳定性评估，可以通过示例来快速选择合适的音色。

WebUI

在官方 WebUI 中使用时，可直接将音色码复制之后，替换 9. Speaker Embedding 中的值，实现音色控制。

Python

在 Python 脚本中使用时，参考 issue#07 中的压缩方案实现音色控制。

spk = torch.load("asset/seed_1332_restored_emb.pt", map_location=torch.device('cpu')).detach()
spk_emb_str = compress_and_encode(spk)

params_infer_code = ChatTTS.Chat.InferCodeParams(
    spk_emb= spk_emb_str,  # add sampled speaker
    temperature=.0003,  # using custom temperature
    top_P=0.7,  # top P decode
    top_K=20,  # top K decode
)

入门教程

中文教程

视频	亮点
同济子豪兄	从入门到进阶的详细部署教程
ZTFS	Mac M1 部署教程
王-寳寳	Windows 部署教程

英文教程

视频	亮点
Sam Witteveen	英文版介绍

常见问题

经过近期的迭代，源仓库代码中的问题已经基本解决。如果遇到问题，建议先详细查看官方说明文档中文版，如果还有问题可以继续查看本文档。

模型无法下载

原版项目运行需要从 HuggingFace 下载对应的模型，如果不能顺畅科学上网，那么就无法完成这一步。作为替代方案，可以从 modelscope 上下载模型和配置，并配置本地路径。

[!Important] 魔塔上的模型库是由志愿者维护的，不保证所有模型都是最新的，如果有需要请自行验证。

在终端中安装 modelscope 依赖

pip install modelscope

修改 webui.py 中的代码

# 在开头导入依赖，并下载模型和配置
from modelscope import snapshot_download
model_dir = snapshot_download('zlj2546/ChatTTS')

# 第 118 行修改模型路径
ret = chat.load_models('custom', custom_path=model_dir)

IDE 中无法运行

在 IDE 中运行时，由于文件相对路径的问题，导致脚本无法顺利运行。

建议参照官方说明文档快速启动中的指令直接在终端中运行。

确保在执行以下命令时，处于项目根目录下。

1. WebUI 可视化界面

python examples/web/webui.py

2. 命令行交互

生成的音频将保存至 ./output_audio_n.mp3

python examples/cmd/run.py "Your text 1." "Your text 2."

语气标签被读出

出现这个问题是因为官方代码处理中文标点符号时覆盖不全，例如 ？、… 等符号没有被处理，导致模型生成时出错。

可以手动删除类似的中文标点符号，或者修改 ChatTTS/utils/infer_utils.py 中的代码，在 103 行的 character_map 的字典中添加缺失的标点符号。

character_map = {
    '…': '',
    '—': ',',
    '＿': ',',
    '？': ',',
    }

GPU 无法使用

GPU 至少需要 4G 显存，否则将强制使用 CPU，相关问题可以参考 ChatTTS-ui 项目中的说明

报错速查

1、load_models() got an unexpected keyword argument 'source'

详见 常见问题 - 模型无法下载

2、cannot import name 'CommitOperationAdd' from 'huggingface_hub'

详见 常见问题 - 模型无法下载

3、 FileNotFoundError：［Erzno 2］ No such file or directory： 'C：\\Users\\xxx\\.cache\\huggingface\\hub\\models--2Noise--ChatTTS\\snapshots\

详见 常见问题 - 模型无法下载

4、local variable 'Normalizer' referenced before assignment

需要根据 安装指南 完成环境配置后，再安装 pynini 和 WeTextProcessing 依赖

conda install -c conda-forge pynini=2.1.5 && pip install WeTextProcessing

5、download to Local path D：\pythonlproject\ChatTTS\ChatTTS failed.

在 IDE 中直接执行脚本，会因为文件路径问题报错，详见 常见问题 - IDE 中无法运行

6、ModuleNotFoundError : No module named'Cython'

未找到 Python 执行路径，Windows 设备需要按教程配置环境路径

项目趋势

For Tasks:

Click tags to check more tools for each tasks

generate voice samples customize speech emotions install chattts troubleshoot errors explore voice seed variations

For Jobs:

speech researcher ai engineer voice technology specialist software developer linguistics analyst

Alternative AI tools for Awesome-ChatTTS

Similar Open Source Tools

Awesome-ChatTTS

github

: 594

ChuanhuChatGPT

Chuanhu Chat is a user-friendly web graphical interface that provides various additional features for ChatGPT and other language models. It supports GPT-4, file-based question answering, local deployment of language models, online search, agent assistant, and fine-tuning. The tool offers a range of functionalities including auto-solving questions, online searching with network support, knowledge base for quick reading, local deployment of language models, GPT 3.5 fine-tuning, and custom model integration. It also features system prompts for effective role-playing, basic conversation capabilities with options to regenerate or delete dialogues, conversation history management with auto-saving and search functionalities, and a visually appealing user experience with themes, dark mode, LaTeX rendering, and PWA application support.

github

: 15.2k

Speech-AI-Forge

Speech-AI-Forge is a project developed around TTS generation models, implementing an API Server and a WebUI based on Gradio. The project offers various ways to experience and deploy Speech-AI-Forge, including online experience on HuggingFace Spaces, one-click launch on Colab, container deployment with Docker, and local deployment. The WebUI features include TTS model functionality, speaker switch for changing voices, style control, long text support with automatic text segmentation, refiner for ChatTTS native text refinement, various tools for voice control and enhancement, support for multiple TTS models, SSML synthesis control, podcast creation tools, voice creation, voice testing, ASR tools, and post-processing tools. The API Server can be launched separately for higher API throughput. The project roadmap includes support for various TTS models, ASR models, voice clone models, and enhancer models. Model downloads can be manually initiated using provided scripts. The project aims to provide inference services and may include training-related functionalities in the future.

github

: 1.2k

XianyuAutoAgent

Xianyu AutoAgent is an AI customer service robot system specifically designed for the Xianyu platform, providing 24/7 automated customer service, supporting multi-expert collaborative decision-making, intelligent bargaining, and context-aware conversations. The system includes intelligent conversation engine with features like context awareness and expert routing, business function matrix with modules like core engine, bargaining system, technical support, and operation monitoring. It requires Python 3.8+ and NodeJS 18+ for installation and operation. Users can customize prompts for different experts and contribute to the project through issues or pull requests.

github

: 973

bailing

Bailing is an open-source voice assistant designed for natural conversations with users. It combines Automatic Speech Recognition (ASR), Voice Activity Detection (VAD), Large Language Model (LLM), and Text-to-Speech (TTS) technologies to provide a high-quality voice interaction experience similar to GPT-4o. Bailing aims to achieve GPT-4o-like conversation effects without the need for GPU, making it suitable for various edge devices and low-resource environments. The project features efficient open-source models, modular design allowing for module replacement and upgrades, support for memory function, tool integration for information retrieval and task execution via voice commands, and efficient task management with progress tracking and reminders.

github

: 893

HivisionIDPhotos

HivisionIDPhoto is a practical algorithm for intelligent ID photo creation. It utilizes a comprehensive model workflow to recognize, cut out, and generate ID photos for various user photo scenarios. The tool offers lightweight cutting, standard ID photo generation based on different size specifications, six-inch layout photo generation, beauty enhancement (waiting), and intelligent outfit swapping (waiting). It aims to solve emergency ID photo creation issues.

github

: 10.3k

k8m

k8m is an AI-driven Mini Kubernetes AI Dashboard lightweight console tool designed to simplify cluster management. It is built on AMIS and uses 'kom' as the Kubernetes API client. k8m has built-in Qwen2.5-Coder-7B model interaction capabilities and supports integration with your own private large models. Its key features include miniaturized design for easy deployment, user-friendly interface for intuitive operation, efficient performance with backend in Golang and frontend based on Baidu AMIS, pod file management for browsing, editing, uploading, downloading, and deleting files, pod runtime management for real-time log viewing, log downloading, and executing shell commands within pods, CRD management for automatic discovery and management of CRD resources, and intelligent translation and diagnosis based on ChatGPT for YAML property translation, Describe information interpretation, AI log diagnosis, and command recommendations, providing intelligent support for managing k8s. It is cross-platform compatible with Linux, macOS, and Windows, supporting multiple architectures like x86 and ARM for seamless operation. k8m's design philosophy is 'AI-driven, lightweight and efficient, simplifying complexity,' helping developers and operators quickly get started and easily manage Kubernetes clusters.

github

: 157

BlueLM

BlueLM is a large-scale pre-trained language model developed by vivo AI Global Research Institute, featuring 7B base and chat models. It includes high-quality training data with a token scale of 26 trillion, supporting both Chinese and English languages. BlueLM-7B-Chat excels in C-Eval and CMMLU evaluations, providing strong competition among open-source models of similar size. The models support 32K long texts for better context understanding while maintaining base capabilities. BlueLM welcomes developers for academic research and commercial applications.

github

: 869

agentica

Agentica is a human-centric framework for building large language model agents. It provides functionalities for planning, memory management, tool usage, and supports features like reflection, planning and execution, RAG, multi-agent, multi-role, and workflow. The tool allows users to quickly code and orchestrate agents, customize prompts, and make API calls to various services. It supports API calls to OpenAI, Azure, Deepseek, Moonshot, Claude, Ollama, and Together. Agentica aims to simplify the process of building AI agents by providing a user-friendly interface and a range of functionalities for agent development.

github

: 108

chats

Sdcb Chats is a powerful and flexible frontend for large language models, supporting multiple functions and platforms. Whether you want to manage multiple model interfaces or need a simple deployment process, Sdcb Chats can meet your needs. It supports dynamic management of multiple large language model interfaces, integrates visual models to enhance user interaction experience, provides fine-grained user permission settings for security, real-time tracking and management of user account balances, easy addition, deletion, and configuration of models, transparently forwards user chat requests based on the OpenAI protocol, supports multiple databases including SQLite, SQL Server, and PostgreSQL, compatible with various file services such as local files, AWS S3, Minio, Aliyun OSS, Azure Blob Storage, and supports multiple login methods including Keycloak SSO and phone SMS verification.

github

: 265

Element-Plus-X

github

: 289

build_MiniLLM_from_scratch

This repository aims to build a low-parameter LLM model through pretraining, fine-tuning, model rewarding, and reinforcement learning stages to create a chat model capable of simple conversation tasks. It features using the bert4torch training framework, seamless integration with transformers package for inference, optimized file reading during training to reduce memory usage, providing complete training logs for reproducibility, and the ability to customize robot attributes. The chat model supports multi-turn conversations. The trained model currently only supports basic chat functionality due to limitations in corpus size, model scale, SFT corpus size, and quality.

github

: 397

HaE

HaE is a framework project in the field of network security (data security) that combines artificial intelligence (AI) large models to achieve highlighting and information extraction of HTTP messages (including WebSocket). It aims to reduce testing time, focus on valuable and meaningful messages, and improve vulnerability discovery efficiency. The project provides a clear and visual interface design, simple interface interaction, and centralized data panel for querying and extracting information. It also features built-in color upgrade algorithm, one-click export/import of data, and integration of AI large models API for optimized data processing.

github

: 2.7k

ailab

The 'ailab' project is an experimental ground for code generation combining AI (especially coding agents) and Deno. It aims to manage configuration files defining coding rules and modes in Deno projects, enhancing the quality and efficiency of code generation by AI. The project focuses on defining clear rules and modes for AI coding agents, establishing best practices in Deno projects, providing mechanisms for type-safe code generation and validation, applying test-driven development (TDD) workflow to AI coding, and offering implementation examples utilizing design patterns like adapter pattern.

github

: 280

video-subtitle-remover

Video-subtitle-remover (VSR) is a software based on AI technology that removes hard subtitles from videos. It achieves the following functions: - Lossless resolution: Remove hard subtitles from videos, generate files with subtitles removed - Fill the region of removed subtitles using a powerful AI algorithm model (non-adjacent pixel filling and mosaic removal) - Support custom subtitle positions, only remove subtitles in defined positions (input position) - Support automatic removal of all text in the entire video (no input position required) - Support batch removal of watermark text from multiple images.

github

: 4.0k

Code-Review-GPT-Gitlab

A project that utilizes large models to help with Code Review on Gitlab, aimed at improving development efficiency. The project is customized for Gitlab and is developing a Multi-Agent plugin for collaborative review. It integrates various large models for code security issues and stays updated with the latest Code Review trends. The project architecture is designed to be powerful, flexible, and efficient, with easy integration of different models and high customization for developers.

github

: 452

For similar tasks

Awesome-ChatTTS

github

: 594

openllmetry-js

OpenLLMetry-JS is a set of extensions built on top of OpenTelemetry that gives you complete observability over your LLM application. Because it uses OpenTelemetry under the hood, it can be connected to your existing observability solutions - Datadog, Honeycomb, and others. It's built and maintained by Traceloop under the Apache 2.0 license. The repo contains standard OpenTelemetry instrumentations for LLM providers and Vector DBs, as well as a Traceloop SDK that makes it easy to get started with OpenLLMetry-JS, while still outputting standard OpenTelemetry data that can be connected to your observability stack. If you already have OpenTelemetry instrumented, you can just add any of our instrumentations directly.

github

: 297

ClaudeSync

ClaudeSync is a powerful tool designed to seamlessly synchronize local files with Claude.ai projects. It bridges the gap between local development environment and Claude.ai's knowledge base, offering real-time synchronization, CLI for easy management, support for multiple organizations and projects, intelligent file filtering, configurable sync interval, two-way synchronization, and more. It ensures data privacy, open source transparency, and comes with disclaimers for use at own risk. Users can quickly start syncing by installing, logging in, selecting organization and project, and running sync. Advanced features include API, organization, project, file, chat management, configuration, synchronization modes, scheduled sync, providers, custom ignore file, and troubleshooting. Contributions are welcome, and communication channels include GitHub Issues and Discord. Licensed under MIT License.

github

: 407

ai-research-assistant

Aria is a Zotero plugin that serves as an AI Research Assistant powered by Large Language Models (LLMs). It offers features like drag-and-drop referencing, autocompletion for creators and tags, visual analysis using GPT-4 Vision, and saving chats as notes and annotations. Aria requires the OpenAI GPT-4 model family and provides a configurable interface through preferences. Users can install Aria by downloading the latest release from GitHub and activating it in Zotero. The tool allows users to interact with Zotero library through conversational AI and probabilistic models, with the ability to troubleshoot errors and provide feedback for improvement.

github

: 1.1k

desktop

ComfyUI Desktop is a packaged desktop application that allows users to easily use ComfyUI with bundled features like ComfyUI source code, ComfyUI-Manager, and uv. It automatically installs necessary Python dependencies and updates with stable releases. The app comes with Electron, Chromium binaries, and node modules. Users can store ComfyUI files in a specified location and manage model paths. The tool requires Python 3.12+ and Visual Studio with Desktop C++ workload for Windows. It uses nvm to manage node versions and yarn as the package manager. Users can install ComfyUI and dependencies using comfy-cli, download uv, and build/launch the code. Troubleshooting steps include rebuilding modules and installing missing libraries. The tool supports debugging in VSCode and provides utility scripts for cleanup. Crash reports can be sent to help debug issues, but no personal data is included.

github

: 1.3k

chat-mcp

A Cross-Platform Interface for Large Language Models (LLMs) utilizing the Model Context Protocol (MCP) to connect and interact with various LLMs. The desktop app, built on Electron, ensures compatibility across Linux, macOS, and Windows. It simplifies understanding MCP principles, facilitates testing of multiple servers and LLMs, and supports dynamic LLM configuration and multi-client management. The UI can be extracted for web use, ensuring consistency across web and desktop versions.

github

: 93

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

Awesome-ChatTTS

README:

官方简介

快速体验

热门分支

功能增强

功能扩展

界面说明

文本控制

音色控制

情感控制

系数控制

播放控制

示例控制

音色控制

WebUI

Python

入门教程

中文教程

英文教程

常见问题

模型无法下载

IDE 中无法运行

1. WebUI 可视化界面

2. 命令行交互

语气标签被读出

GPU 无法使用

报错速查

项目趋势

For Tasks:

For Jobs:

Alternative AI tools for Awesome-ChatTTS

Similar Open Source Tools

Awesome-ChatTTS

ChuanhuChatGPT

Speech-AI-Forge

XianyuAutoAgent

bailing

HivisionIDPhotos

k8m

BlueLM

agentica

chats

Element-Plus-X

build_MiniLLM_from_scratch

HaE

ailab

video-subtitle-remover

Code-Review-GPT-Gitlab

For similar tasks

Awesome-ChatTTS

openllmetry-js

ClaudeSync

ai-research-assistant

desktop

chat-mcp

For similar jobs

sweep

teams-ai

ai-guide

classifai

chatbot-ui

BricksLLM

uAgents

griptape