qwen-tts

Qwen-TTS offers a robust voice synthesis service using FastAPI, supporting bilingual and dialect options. Explore seamless audio generation on GitHub! 🚀🌟

Stars: 60

Visit

Qwen-TTS is a versatile text-to-speech service offering multi-voice support for both Chinese and English, including dialects like Beijing, Shanghai, and Sichuan. It provides real-time synthesis, batch processing, smart segmentation, progress tracking, audio playback, and outputs in WAV format. The application features a modern design, intuitive operation, history tracking, and real-time feedback. It also offers technical features like asynchronous processing, error handling, file management, and API documentation.

README:

Qwen-TTS: A Versatile Text-to-Speech Service

Overview

Welcome to the Qwen-TTS repository. This project offers a powerful text-to-speech service built on the Qwen-TTS API. It supports both Chinese and English, along with various Chinese dialects. This FastAPI application aims to provide a seamless experience for users needing voice synthesis capabilities.

📸 Interface Preview

Single Synthesis Interface

Batch Processing Interface

✨ Features

🎯 Core Features

Multi-Voice Support: Choose from 7 different voices, including options for both Chinese and English.
Dialect Support: Includes Beijing, Shanghai, and Sichuan dialects.
Real-Time Synthesis: Experience fast and responsive voice synthesis.
Batch Processing: Upload and process txt/md files in bulk.
Smart Segmentation: Automatically split text by paragraphs, sentences, or chapters.
Progress Tracking: Monitor batch processing progress in real time.
Audio Playback: Play and download audio directly from the interface.
Audio Format: Outputs in WAV format for clear sound quality.

🎨 Interface Features

Modern Design: A visually appealing and responsive user interface.
Consistent Layout: Unified width (1200px) for both single synthesis and batch processing pages.
Intuitive Operation: User-friendly experience for all users.
History Tracking: Automatically saves synthesis history for easy access.
Real-Time Feedback: Character count and status updates displayed live.

🔧 Technical Features

Asynchronous Processing: High-performance asynchronous API for quick responses.
Error Handling: Comprehensive error management to ensure smooth operation.
File Management: Automatic management of audio files for convenience.
API Documentation: Complete OpenAPI documentation available for developers.

🎤 Supported Voices

Voice	Language	Description	Dialect
Cherry	Chinese/English	Gentle and sweet female voice	Standard Mandarin
Ethan	Chinese/English	Mature and steady male voice	Standard Mandarin
Chelsie	Chinese/English	Lively and cute female voice	Standard Mandarin
Serena	Chinese/English	Elegant and knowledgeable female voice	Standard Mandarin
Dylan	Chinese	Authentic Beijing male voice	Beijing dialect
Jada	Chinese	Gentle Shanghai female voice	Shanghai dialect
Sunny	Chinese	Enthusiastic Sichuan female voice	Sichuan dialect

🚀 Quick Start

To get started with Qwen-TTS, follow these steps:

Download the Latest Release: Visit the Releases section to download the latest version of the application.
Installation: Follow the installation instructions provided in the release notes.
Run the Application: Execute the application to start using the text-to-speech features.

For detailed instructions, refer to the documentation provided in the repository.

📦 Installation

Requirements

Python 3.8 or higher
FastAPI
Uvicorn
Additional dependencies as listed in requirements.txt

Installation Steps

Clone the repository:

git clone https://github.com/mco2004/qwen-tts.git
cd qwen-tts

Install the required packages:
```
pip install -r requirements.txt
```
Run the application:
```
uvicorn main:app --reload
```

Configuration

You may need to configure certain parameters in a configuration file or environment variables. Check the repository for more details.

🌐 API Endpoints

The Qwen-TTS API provides several endpoints for text-to-speech operations. Below are the main endpoints available:

1. Single Synthesis

Endpoint: /synthesize
Method: POST
Description: Synthesize speech from a single text input.

2. Batch Processing

Endpoint: /batch
Method: POST
Description: Upload a file containing multiple texts for batch synthesis.

3. Voice List

Endpoint: /voices
Method: GET
Description: Retrieve a list of available voices and their details.

Refer to the API documentation for more information on request and response formats.

🔍 Usage Examples

Single Synthesis Example

To synthesize speech from a single text input, send a POST request to the /synthesize endpoint with the following JSON body:

{
  "text": "你好，欢迎使用 Qwen-TTS！",
  "voice": "Cherry"
}

Batch Processing Example

To process multiple texts, upload a file to the /batch endpoint. The file should contain one text per line.

📄 Documentation

Comprehensive API documentation is available in the repository. This includes detailed descriptions of each endpoint, request/response formats, and examples.

🔗 Links

For the latest releases, visit the Releases section.
Check the Wiki for more in-depth tutorials and guides.

🛠️ Contribution

We welcome contributions to improve Qwen-TTS. If you would like to contribute, please fork the repository and submit a pull request. Ensure that you follow the coding standards and include tests for new features.

📬 Contact

For questions or feedback, please open an issue in the repository. You can also reach out via email or other contact methods listed in the repository.

🎉 Acknowledgments

Thank you to all contributors and users who support the development of Qwen-TTS. Your feedback helps us improve the application and provide better services.

Explore the capabilities of Qwen-TTS and enhance your projects with advanced text-to-speech functionalities.

For Tasks:

Click tags to check more tools for each tasks

create audio content teach pronunciation record podcasts assist customers enhance educational materials

For Jobs:

content creator language teacher podcaster customer service representative educational technology specialist

Alternative AI tools for qwen-tts

Similar Open Source Tools

qwen-tts

github

: 60

Vodalus-Expert-LLM-Forge

Vodalus Expert LLM Forge is a tool designed for crafting datasets and efficiently fine-tuning models using free open-source tools. It includes components for data generation, LLM interaction, RAG engine integration, model training, fine-tuning, and quantization. The tool is suitable for users at all levels and is accompanied by comprehensive documentation. Users can generate synthetic data, interact with LLMs, train models, and optimize performance for local execution. The tool provides detailed guides and instructions for setup, usage, and customization.

github

: 131

ComfyUI-OllamaGemini

ComfyUI GeminiOllama Extension integrates Google's Gemini API, OpenAI (ChatGPT), Anthropic's Claude, Ollama, Qwen, and image processing tools into ComfyUI for leveraging powerful models and features directly within workflows. Features include multiple AI API integrations, advanced prompt engineering, Gemini image generation, background removal, SVG conversion, FLUX resolutions, ComfyUI Styler, smart prompt generator, and more. The extension offers comprehensive API integration, advanced prompt engineering with researched templates, high-quality tools like Smart Prompt Generator and BRIA RMBG, and supports video & audio processing. It provides a single interface to access powerful AI models, transform prompts into detailed instructions, and use various tools for image processing, styling, and content generation.

github

: 120

paelladoc

PAELLADOC is an intelligent documentation system that uses AI to analyze code repositories and generate comprehensive technical documentation. It offers a modular architecture with MECE principles, interactive documentation process, key features like Orchestrator and Commands, and a focus on context for successful AI programming. The tool aims to streamline documentation creation, code generation, and product management tasks for software development teams, providing a definitive standard for AI-assisted development documentation.

github

: 221

SynthLang

SynthLang is a tool designed to optimize AI prompts by reducing costs and improving processing speed. It brings academic rigor to prompt engineering, creating precise and powerful AI interactions. The tool includes core components like a Translator Engine, Performance Optimization, Testing Framework, and Technical Architecture. It offers mathematical precision, academic rigor, enhanced security, a modern interface, and instant testing. Users can integrate mathematical frameworks, model complex relationships, and apply structured prompts to various domains. Security features include API key management and data privacy. The tool also provides a CLI for prompt engineering and optimization capabilities.

github

: 157

DeepSeekAI

DeepSeekAI is a browser extension plugin that allows users to interact with AI by selecting text on web pages and invoking the DeepSeek large model to provide AI responses. The extension enhances browsing experience by enabling users to get summaries or answers for selected text directly on the webpage. It features context text selection, API key integration, draggable and resizable window, AI streaming replies, Markdown rendering, one-click copy, re-answer option, code copy functionality, language switching, and multi-turn dialogue support. Users can install the extension from Chrome Web Store or Edge Add-ons, or manually clone the repository, install dependencies, and build the extension. Configuration involves entering the DeepSeek API key in the extension popup window to start using the AI-driven responses.

github

: 203

RepoMaster

RepoMaster is an AI agent that leverages GitHub repositories to solve complex real-world tasks. It transforms how coding tasks are solved by automatically finding the right GitHub tools and making them work together seamlessly. Users can describe their tasks, and RepoMaster's AI analysis leads to auto discovery and smart execution, resulting in perfect outcomes. The tool provides a web interface for beginners and a command-line interface for advanced users, along with specialized agents for deep search, general assistance, and repository tasks.

github

: 167

beeai-platform

BeeAI is an open-source platform that simplifies the discovery, running, and sharing of AI agents across different frameworks. It addresses challenges such as framework fragmentation, deployment complexity, and discovery issues by providing a standardized platform for individuals and teams to access agents easily. With features like a centralized agent catalog, framework-agnostic interfaces, containerized agents, and consistent user experiences, BeeAI aims to streamline the process of working with AI agents for both developers and teams.

github

: 766

DreamLayer

DreamLayer AI is an open-source Stable Diffusion WebUI designed for AI researchers, labs, and developers. It automates prompts, seeds, and metrics for benchmarking models, datasets, and samplers, enabling reproducible evaluations across multiple seeds and configurations. The tool integrates custom metrics and evaluation pipelines, providing a streamlined workflow for AI research. With features like automated benchmarking, reproducibility, built-in metrics, multi-modal readiness, and researcher-friendly interface, DreamLayer AI aims to simplify and accelerate the model evaluation process.

github

: 367

replexica

Replexica is an i18n toolkit for React, to ship multi-language apps fast. It doesn't require extracting text into JSON files, and uses AI-powered API for content processing. It comes in two parts: 1. Replexica Compiler - an open-source compiler plugin for React; 2. Replexica API - an i18n API in the cloud that performs translations using LLMs. (Usage based, has a free tier.) Replexica supports several i18n formats: 1. JSON-free Replexica compiler format; 2. .md files for Markdown content; 3. Legacy JSON and YAML-based formats.

github

: 1.3k

AIWritingCompanion

AIWritingCompanion is a lightweight and versatile browser extension designed to translate text within input fields. It offers universal compatibility, multiple activation methods, and support for various translation providers like Gemini, OpenAI, and WebAI to API. Users can install it via CRX file or Git, set API key, and use it for automatic translation or via shortcut. The tool is suitable for writers, translators, students, researchers, and bloggers. AI keywords include writing assistant, translation tool, browser extension, language translation, and text translator. Users can use it for tasks like translate text, assist in writing, simplify content, check language accuracy, and enhance communication.

github

: 92

Apt

Apt. is a free and open-source AI productivity tool designed to enhance user productivity while ensuring privacy and data security. It offers efficient AI solutions such as built-in ChatGPT, batch image and video processing, and more. Key features include free and open-source code, privacy protection through local deployment, offline operation, no installation needed, and multi-language support. Integrated AI models cover ChatGPT for intelligent conversations, image processing features like super-resolution and color restoration, and video processing capabilities including super-resolution and frame interpolation. Future plans include integrating more AI models. The tool provides user guides and technical support via email and various platforms, with a user-friendly interface for easy navigation.

github

: 624

WatermarkRemover-AI

WatermarkRemover-AI is an advanced application that utilizes AI models for precise watermark detection and seamless removal. It leverages Florence-2 for watermark identification and LaMA for inpainting. The tool offers both a command-line interface (CLI) and a PyQt6-based graphical user interface (GUI), making it accessible to users of all levels. It supports dual modes for processing images, advanced watermark detection, seamless inpainting, customizable output settings, real-time progress tracking, dark mode support, and efficient GPU acceleration using CUDA.

github

: 78

scira

Scira is a powerful open-source tool for analyzing and visualizing data. It provides a user-friendly interface for data exploration, cleaning, and modeling. With Scira, users can easily import datasets, perform statistical analysis, create insightful visualizations, and generate reports. The tool supports various data formats and offers a wide range of statistical functions and visualization options. Whether you are a data scientist, researcher, or student, Scira can help you uncover valuable insights from your data and communicate your findings effectively.

github

: 10.7k

bytebot

Bytebot is an open-source AI desktop agent that provides a virtual employee with its own computer to complete tasks for users. It can use various applications, download and organize files, log into websites, process documents, and perform complex multi-step workflows. By giving AI access to a complete desktop environment, Bytebot unlocks capabilities not possible with browser-only agents or API integrations, enabling complete task autonomy, document processing, and usage of real applications.

github

: 6.6k

better-chatbot

Better Chatbot is an open-source AI chatbot designed for individuals and teams, inspired by various AI models. It integrates major LLMs, offers powerful tools like MCP protocol and data visualization, supports automation with custom agents and visual workflows, enables collaboration by sharing configurations, provides a voice assistant feature, and ensures an intuitive user experience. The platform is built with Vercel AI SDK and Next.js, combining leading AI services into one platform for enhanced chatbot capabilities.

github

: 658

For similar tasks

qwen-tts

github

: 60

tts-generation-webui

TTS Generation WebUI is a comprehensive tool that provides a user-friendly interface for text-to-speech and voice cloning tasks. It integrates various AI models such as Bark, MusicGen, AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and MAGNeT. The tool offers one-click installers, Google Colab demo, videos for guidance, and extra voices for Bark. Users can generate audio outputs, manage models, caches, and system space for AI projects. The project is open-source and emphasizes ethical and responsible use of AI technology.

github

: 1.6k

For similar jobs

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

daily-poetry-image

Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.

github

: 492

exif-photo-blog

EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.

github

: 1.4k

SillyTavern

SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.

github

: 18.8k

Twitter-Insight-LLM

This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).

github

: 401

AISuperDomain

Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.

github

: 1.2k

ChatGPT-On-CS

This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.

github

: 768

obs-localvocal

LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.

github

: 248