bailing

百聆是一个类似GPT-4o的语音对话机器人，通过ASR+LLM+TTS实现，集成DeepSeek R1等优秀大模型，时延低至800ms，Mac等低配置也可运行，支持打断

Stars: 893

Visit

Bailing is an open-source voice assistant designed for natural conversations with users. It combines Automatic Speech Recognition (ASR), Voice Activity Detection (VAD), Large Language Model (LLM), and Text-to-Speech (TTS) technologies to provide a high-quality voice interaction experience similar to GPT-4o. Bailing aims to achieve GPT-4o-like conversation effects without the need for GPU, making it suitable for various edge devices and low-resource environments. The project features efficient open-source models, modular design allowing for module replacement and upgrades, support for memory function, tool integration for information retrieval and task execution via voice commands, and efficient task management with progress tracking and reminders.

README:

百聆 (Bailing)

[ 中文 | English ]

百聆是一个开源的语音对话助手，旨在通过语音与用户进行自然的对话。该项目结合了语音识别 (ASR)、语音活动检测 (VAD)、大语言模型 (LLM) 和语音合成 (TTS) 技术，这是一个类似GPT-4o的语音对话机器人，通过ASR+LLM+TTS实现，提供高质量的语音对话体验，端到端时延800ms。百聆旨在无需GPU的情况下，实现类GPT-4o的对话效果，适用于各种边缘设备和低资源环境。

项目特点

🚀 流畅对话体验：低延迟、不卡顿，几乎像真人对话一样自然，百聆使用了多个开源模型，确保高效、可靠的语音对话体验。
🖥 轻量级部署：无需高端硬件，甚至不需要 GPU，通过优化，可本地部署，仍能提供类GPT-4的性能表现。
🔧 模块化设计：ASR、VAD、LLM和TTS模块相互独立，可以根据需求进行替换和升级。
🧠 智能记忆功能：具备持续学习能力，能够记忆用户的偏好与历史对话，提供个性化的互动体验。
🛠 工具调用能力：灵活集成外部工具，用户可通过语音直接请求信息或执行操作，提升助手的实用性。
📅 任务管理：高效管理用户任务，能够跟踪进度、设置提醒，并提供动态更新，确保用户不错过任何重要事项。

感谢开源社区

百聆的诞生，离不开开源社区的无私贡献。

感谢 DeepSeek、FunASR、Silero-VAD、ChatTTS、OpenManus 等优秀的开源项目，让我们有机会打造一个真正开放、强大、低门槛的语音 AI 助手！

如果你也认同让 AI 触手可及的理念，欢迎一起贡献代码、优化模型，让百聆更强、更智能，成为真正的 JARVIS！

📢 欢迎 Star & PR

项目简介

百聆通过以下技术组件实现语音对话功能：

🎙 ASR: 使用 FunASR 进行自动语音识别，将用户的语音转换为文本。
🎚 VAD: 使用 silero-vad 进行语音活动检测，以确保只处理有效的语音片段。
🧠 LLM: 使用 deepseek 作为大语言模型来处理用户输入并生成响应，极具性价比。
🔊 TTS: 使用 edge-tts Kokoro-82M ChatTTS MacOS say进行文本到语音的转换，将生成的文本响应转换为自然流畅的语音。

框架说明

Robot 负责高效的任务管理与记忆管理，能够智能地处理用户的打断请求，同时实现各个模块之间的无缝协调与连接，以确保流畅的交互体验。

播放器状态	是否说话	说明
播放中	未说话	正常
播放中	说话	打断场景
未播放	未说话	正常
未播放	说话	VAD判断，ASR识别

Demo

bailing audio dialogue

功能特性

语音输入：通过 FunASR 进行准确的语音识别。
语音活动检测：使用 silero-vad 过滤无效音频，提升识别效率。
智能对话生成：依靠 deepseek 提供的强大语言理解能力生成自然的文本回复，极具性价比。
语音输出：通过 edge-tts Kokoro-82M 将文本转为语音，为用户提供逼真的听觉反馈。
支持打断：灵活配置打断策略，能够识别关键字和语音打断，确保用户在对话中的即时反馈与控制，提高交互流畅度。
支持记忆功能: 具备持续学习能力，能够记忆用户的偏好与历史对话，提供个性化的互动体验。
支持工具调用: 灵活集成外部工具，用户可通过语音直接请求信息或执行操作，提升助手的实用性。
支持任务管理: 高效管理用户任务，能够跟踪进度、设置提醒，并提供动态更新，确保用户不错过任何重要事项。

项目优势

高质量语音对话：整合了优秀的ASR、LLM和TTS技术，确保语音对话的流畅性和准确性。
轻量化设计：无需高性能硬件即可运行，适用于资源受限的环境。
完全开源：百聆完全开源，鼓励社区贡献与二次开发。

安装与运行

依赖环境

请确保你的开发环境中安装了以下工具和库：

Python 3.11 或更高版本
pip 包管理器
FunASR、silero-vad、deepseek、edge-tts Kokoro-82M 所需的依赖库

安装步骤

克隆项目仓库：

git clone https://github.com/wwbin2017/bailing.git
cd bailing

安装所需依赖：

pip install -r requirements.txt
pip install -r third_party/OpenManus/requirements.txt

配置环境变量：
- 打开config/config.yaml 配置ASR LLM等相关配置
- 下载SenseVoiceSmall到目录models/SenseVoiceSmall SenseVoiceSmall下载地址
- 去deepseek官网，获取配置api_key，deepseek获取api_key，当然也可以配置openai、qwen、gemini、01yi等其他模型
- 如果需要使用通用AIGC配置（测试中），不可用的话，可以使用tag 分支 v0.0.1 v0.0.2
  - /third_party/OpenManus/config/config.toml 需要配置里面的 model、base_url、api_key

运行项目：

cd server
python server.py # 启动后端服务，也可不执行这一步

python main.py

使用说明

启动应用后，系统会等待语音输入。
通过 FunASR 将用户语音转为文本。
silero-vad 进行语音活动检测，确保只处理有效语音。
deepseek 处理文本输入，并生成智能回复。
edge-tts, Kokoro-82M, ChatTTS, MacOs say 将生成的文本转换为语音，并播放给用户。

Roadmap

[x] 基本语音对话功能
[x] 支持插件调用
[x] 任务管理
[x] Rag & Agent
[x] Memory
[ ] 支持语音唤醒
[ ] 强化WebSearch
[ ] 支持WebRTC

未来，百聆将升华为一款类JARVIS个人助手，仿佛一位贴心的智囊，具备无与伦比的记忆力与前瞻性的任务管理能力。依托于尖端的RAG与Agent技术，它将精确掌控您的事务与知识，化繁为简。只需轻声一语，例如“帮我查找最近新闻”或“总结大模型的最新进展”，百聆便会迅速响应，智能分析，实时跟踪，并将成果优雅地呈现给您。想象一下，您拥有的不仅是一名助手，而是一个深谙您需求的智慧伙伴，伴您在未来的每个重要瞬间，助您洞察万象，决胜千里。

支持的工具

函数名	描述	功能	示例
`get_weather`	获取某个地点的天气信息	提供地点名称后，返回该地点的天气情况	用户说：“杭州天气怎么样？” → `zhejiang/hangzhou`
`ielts_speaking_practice`	IELTS（雅思）口语练习	生成雅思口语练习题目和对话，帮助用户进行雅思口语练习	-
`get_day_of_week`	获取当前的星期几或日期	当用户询问当前时间、日期或者星期几时，返回相应的信息	用户说：“今天星期几？” → 返回当前的星期几
`schedule_task`	创建一个定时任务	用户可以指定任务的执行时间和内容，定时提醒用户	用户说：“每天早上8点提醒我喝水。” → `time: '08:00', content: '提醒我喝水'`
`open_application`	在 Mac 电脑上打开指定的应用程序	用户可以指定应用程序的名称，脚本将在 Mac 上启动相应的应用	用户说：“打开Safari。” → `application_name: 'Safari'`
`web_search`	在网上搜索指定的关键词	根据用户提供的搜索内容，返回相应的搜索结果	用户说：“搜索最新的科技新闻。” → `query: '最新的科技新闻'`
`aigc_manus`	可以做任何事情通用型ai	要执行的任务描述，返回任务执行的结果。	用户说：“分析特定股票的市场趋势” → `query: '分析特定股票的市场趋势'`

贡献指南

欢迎任何形式的贡献！如果你对百聆项目有改进建议或发现问题，请通过 GitHub Issues 进行反馈或提交 Pull Request。

开源协议

该项目基于 MIT 许可证开源。你可以自由地使用、修改和分发此项目，但需要保留原始许可证声明。

联系方式

如有任何疑问或建议，请联系：

GitHub Issues: 项目问题追踪

免责声明

百聆 (Bailing) 是一个开源项目，旨在用于个人学习和研究目的。使用本项目时，请注意以下免责声明：

个人用途：本项目仅用于个人学习和研究，不适用于商业用途或生产环境。
风险和责任：使用百聆 (Bailing) 可能会导致数据丢失、系统故障或其他问题。我们对因使用本项目而导致的任何损失、损害或问题不承担任何责任。
支持：本项目不提供任何形式的技术支持或保证。用户应自行承担使用本项目的风险。

在使用本项目之前，请确保您已了解并接受这些免责声明。如果您不同意这些条款，请不要使用本项目。

感谢您的理解与支持！

Star History

For Tasks:

Click tags to check more tools for each tasks

get weather information ielts speaking practice schedule task reminders open applications on mac web search for keywords

For Jobs:

voice assistant developer ai engineer speech recognition specialist natural language processing engineer edge computing developer

Alternative AI tools for bailing

Similar Open Source Tools

bailing

github

: 893

Speech-AI-Forge

Speech-AI-Forge is a project developed around TTS generation models, implementing an API Server and a WebUI based on Gradio. The project offers various ways to experience and deploy Speech-AI-Forge, including online experience on HuggingFace Spaces, one-click launch on Colab, container deployment with Docker, and local deployment. The WebUI features include TTS model functionality, speaker switch for changing voices, style control, long text support with automatic text segmentation, refiner for ChatTTS native text refinement, various tools for voice control and enhancement, support for multiple TTS models, SSML synthesis control, podcast creation tools, voice creation, voice testing, ASR tools, and post-processing tools. The API Server can be launched separately for higher API throughput. The project roadmap includes support for various TTS models, ASR models, voice clone models, and enhancer models. Model downloads can be manually initiated using provided scripts. The project aims to provide inference services and may include training-related functionalities in the future.

github

: 1.2k

ChuanhuChatGPT

Chuanhu Chat is a user-friendly web graphical interface that provides various additional features for ChatGPT and other language models. It supports GPT-4, file-based question answering, local deployment of language models, online search, agent assistant, and fine-tuning. The tool offers a range of functionalities including auto-solving questions, online searching with network support, knowledge base for quick reading, local deployment of language models, GPT 3.5 fine-tuning, and custom model integration. It also features system prompts for effective role-playing, basic conversation capabilities with options to regenerate or delete dialogues, conversation history management with auto-saving and search functionalities, and a visually appealing user experience with themes, dark mode, LaTeX rendering, and PWA application support.

github

: 15.2k

Awesome-ChatTTS

Awesome-ChatTTS is an official recommended guide for ChatTTS beginners, compiling common questions and related resources. It provides a comprehensive overview of the project, including official introduction, quick experience options, popular branches, parameter explanations, voice seed details, installation guides, FAQs, and error troubleshooting. The repository also includes video tutorials, discussion community links, and project trends analysis. Users can explore various branches for different functionalities and enhancements related to ChatTTS.

github

: 594

chats

Sdcb Chats is a powerful and flexible frontend for large language models, supporting multiple functions and platforms. Whether you want to manage multiple model interfaces or need a simple deployment process, Sdcb Chats can meet your needs. It supports dynamic management of multiple large language model interfaces, integrates visual models to enhance user interaction experience, provides fine-grained user permission settings for security, real-time tracking and management of user account balances, easy addition, deletion, and configuration of models, transparently forwards user chat requests based on the OpenAI protocol, supports multiple databases including SQLite, SQL Server, and PostgreSQL, compatible with various file services such as local files, AWS S3, Minio, Aliyun OSS, Azure Blob Storage, and supports multiple login methods including Keycloak SSO and phone SMS verification.

github

: 265

HaE

HaE is a framework project in the field of network security (data security) that combines artificial intelligence (AI) large models to achieve highlighting and information extraction of HTTP messages (including WebSocket). It aims to reduce testing time, focus on valuable and meaningful messages, and improve vulnerability discovery efficiency. The project provides a clear and visual interface design, simple interface interaction, and centralized data panel for querying and extracting information. It also features built-in color upgrade algorithm, one-click export/import of data, and integration of AI large models API for optimized data processing.

github

: 2.7k

XianyuAutoAgent

Xianyu AutoAgent is an AI customer service robot system specifically designed for the Xianyu platform, providing 24/7 automated customer service, supporting multi-expert collaborative decision-making, intelligent bargaining, and context-aware conversations. The system includes intelligent conversation engine with features like context awareness and expert routing, business function matrix with modules like core engine, bargaining system, technical support, and operation monitoring. It requires Python 3.8+ and NodeJS 18+ for installation and operation. Users can customize prompts for different experts and contribute to the project through issues or pull requests.

github

: 973

ChatGPT-Next-Web-Pro

ChatGPT-Next-Web-Pro is a tool that provides an enhanced version of ChatGPT-Next-Web with additional features and functionalities. It offers complete ChatGPT-Next-Web functionality, file uploading and storage capabilities, drawing and video support, multi-modal support, reverse model support, knowledge base integration, translation, customizations, and more. The tool can be deployed with or without a backend, allowing users to interact with AI models, manage accounts, create models, manage API keys, handle orders, manage memberships, and more. It supports various cloud services like Aliyun OSS, Tencent COS, and Minio for file storage, and integrates with external APIs like Azure, Google Gemini Pro, and Luma. The tool also provides options for customizing website titles, subtitles, icons, and plugin buttons, and offers features like voice input, file uploading, real-time token count display, and more.

github

: 625

Code-Review-GPT-Gitlab

A project that utilizes large models to help with Code Review on Gitlab, aimed at improving development efficiency. The project is customized for Gitlab and is developing a Multi-Agent plugin for collaborative review. It integrates various large models for code security issues and stays updated with the latest Code Review trends. The project architecture is designed to be powerful, flexible, and efficient, with easy integration of different models and high customization for developers.

github

: 452

Awesome-LLM-RAG-Application

Awesome-LLM-RAG-Application is a repository that provides resources and information about applications based on Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) pattern. It includes a survey paper, GitHub repo, and guides on advanced RAG techniques. The repository covers various aspects of RAG, including academic papers, evaluation benchmarks, downstream tasks, tools, and technologies. It also explores different frameworks, preprocessing tools, routing mechanisms, evaluation frameworks, embeddings, security guardrails, prompting tools, SQL enhancements, LLM deployment, observability tools, and more. The repository aims to offer comprehensive knowledge on RAG for readers interested in exploring and implementing LLM-based systems and products.

github

: 687

gez

Gez is a high-performance micro frontend framework based on ESM. It uses Rspack compilation and maps modules to URLs with strong caching and content-based hashing. Gez embraces modern micro frontend architecture by leveraging ESM and importmap for dependency management, providing reliable isolation with module scope, seamless integration with any modern frontend framework, intuitive development experience, and optimal performance with zero runtime overhead and reliable caching strategies.

github

: 584

BlueLM

BlueLM is a large-scale pre-trained language model developed by vivo AI Global Research Institute, featuring 7B base and chat models. It includes high-quality training data with a token scale of 26 trillion, supporting both Chinese and English languages. BlueLM-7B-Chat excels in C-Eval and CMMLU evaluations, providing strong competition among open-source models of similar size. The models support 32K long texts for better context understanding while maintaining base capabilities. BlueLM welcomes developers for academic research and commercial applications.

github

: 869

lawyer-llama

Lawyer LLaMA is a large language model that has been specifically trained on legal data, including Chinese laws, regulations, and case documents. It has been fine-tuned on a large dataset of legal questions and answers, enabling it to understand and respond to legal inquiries in a comprehensive and informative manner. Lawyer LLaMA is designed to assist legal professionals and individuals with a variety of law-related tasks, including: * **Legal research:** Quickly and efficiently search through vast amounts of legal information to find relevant laws, regulations, and case precedents. * **Legal analysis:** Analyze legal issues, identify potential legal risks, and provide insights on how to proceed. * **Document drafting:** Draft legal documents, such as contracts, pleadings, and legal opinions, with accuracy and precision. * **Legal advice:** Provide general legal advice and guidance on a wide range of legal matters, helping users understand their rights and options. Lawyer LLaMA is a powerful tool that can significantly enhance the efficiency and effectiveness of legal research, analysis, and decision-making. It is an invaluable resource for lawyers, paralegals, law students, and anyone else who needs to navigate the complexities of the legal system.

github

: 751

LotteryMaster

LotteryMaster is a tool designed to fetch lottery data, save it to Excel files, and provide analysis reports including number prediction, number recommendation, and number trends. It supports multiple platforms for access such as Web and mobile App. The tool integrates AI models like Qwen API and DeepSeek for generating analysis reports and trend analysis charts. Users can configure API parameters for controlling randomness, diversity, presence penalty, and maximum tokens. The tool also includes a frontend project based on uniapp + Vue3 + TypeScript for multi-platform applications. It provides a backend service running on Fastify with Node.js, Cheerio.js for web scraping, Pino for logging, xlsx for Excel file handling, and Jest for testing. The project is still in development and some features may not be fully implemented. The analysis reports are for reference only and do not constitute investment advice. Users are advised to use the tool responsibly and avoid addiction to gambling.

github

: 99

uDesktopMascot

uDesktopMascot is an open-source project for a desktop mascot application with a theme of 'freedom of creation'. It allows users to load and display VRM or GLB/FBX model files on the desktop, customize GUI colors and background images, and access various features through a menu screen. The application supports Windows 10/11 and macOS platforms.

github

: 265

HivisionIDPhotos

HivisionIDPhoto is a practical algorithm for intelligent ID photo creation. It utilizes a comprehensive model workflow to recognize, cut out, and generate ID photos for various user photo scenarios. The tool offers lightweight cutting, standard ID photo generation based on different size specifications, six-inch layout photo generation, beauty enhancement (waiting), and intelligent outfit swapping (waiting). It aims to solve emergency ID photo creation issues.

github

: 10.3k

For similar tasks

M.I.L.E.S

M.I.L.E.S. (Machine Intelligent Language Enabled System) is a voice assistant powered by GPT-4 Turbo, offering a range of capabilities beyond existing assistants. With its advanced language understanding, M.I.L.E.S. provides accurate and efficient responses to user queries. It seamlessly integrates with smart home devices, Spotify, and offers real-time weather information. Additionally, M.I.L.E.S. possesses persistent memory, a built-in calculator, and multi-tasking abilities. Its realistic voice, accurate wake word detection, and internet browsing capabilities enhance the user experience. M.I.L.E.S. prioritizes user privacy by processing data locally, encrypting sensitive information, and adhering to strict data retention policies.

github

: 125

bailing

github

: 893

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k