bailing
百聆 是一个类似GPT-4o的语音对话机器人,通过ASR+LLM+TTS实现,集成DeepSeek R1等优秀大模型,时延低至800ms,Mac等低配置也可运行,支持打断
Stars: 893
Bailing is an open-source voice assistant designed for natural conversations with users. It combines Automatic Speech Recognition (ASR), Voice Activity Detection (VAD), Large Language Model (LLM), and Text-to-Speech (TTS) technologies to provide a high-quality voice interaction experience similar to GPT-4o. Bailing aims to achieve GPT-4o-like conversation effects without the need for GPU, making it suitable for various edge devices and low-resource environments. The project features efficient open-source models, modular design allowing for module replacement and upgrades, support for memory function, tool integration for information retrieval and task execution via voice commands, and efficient task management with progress tracking and reminders.
README:
[ 中文 | English ]
百聆 是一个开源的语音对话助手,旨在通过语音与用户进行自然的对话。该项目结合了语音识别 (ASR)、语音活动检测 (VAD)、大语言模型 (LLM) 和语音合成 (TTS) 技术,这是一个类似GPT-4o的语音对话机器人,通过ASR+LLM+TTS实现,提供高质量的语音对话体验,端到端时延800ms。百聆旨在无需GPU的情况下,实现类GPT-4o的对话效果,适用于各种边缘设备和低资源环境。
- 🚀 流畅对话体验:低延迟、不卡顿,几乎像真人对话一样自然,百聆使用了多个开源模型,确保高效、可靠的语音对话体验。
- 🖥 轻量级部署:无需高端硬件,甚至不需要 GPU,通过优化,可本地部署,仍能提供类GPT-4的性能表现。
- 🔧 模块化设计:ASR、VAD、LLM和TTS模块相互独立,可以根据需求进行替换和升级。
- 🧠 智能记忆功能:具备持续学习能力,能够记忆用户的偏好与历史对话,提供个性化的互动体验。
- 🛠 工具调用能力:灵活集成外部工具,用户可通过语音直接请求信息或执行操作,提升助手的实用性。
- 📅 任务管理:高效管理用户任务,能够跟踪进度、设置提醒,并提供动态更新,确保用户不错过任何重要事项。
百聆的诞生,离不开开源社区的无私贡献。
感谢 DeepSeek、FunASR、Silero-VAD、ChatTTS、OpenManus 等优秀的开源项目, 让我们有机会打造一个真正 开放、强大、低门槛 的语音 AI 助手!
如果你也认同 让 AI 触手可及 的理念,欢迎一起贡献代码、优化模型, 让百聆更强、更智能,成为真正的 JARVIS!
📢 欢迎 Star & PR
百聆通过以下技术组件实现语音对话功能:
- 🎙 ASR: 使用 FunASR 进行自动语音识别,将用户的语音转换为文本。
- 🎚 VAD: 使用 silero-vad 进行语音活动检测,以确保只处理有效的语音片段。
- 🧠 LLM: 使用 deepseek 作为大语言模型来处理用户输入并生成响应,极具性价比。
- 🔊 TTS: 使用 edge-tts Kokoro-82M ChatTTS MacOS say进行文本到语音的转换,将生成的文本响应转换为自然流畅的语音。
Robot 负责高效的任务管理与记忆管理,能够智能地处理用户的打断请求,同时实现各个模块之间的无缝协调与连接,以确保流畅的交互体验。
| 播放器状态 | 是否说话 | 说明 |
|---|---|---|
| 播放中 | 未说话 | 正常 |
| 播放中 | 说话 | 打断场景 |
| 未播放 | 未说话 | 正常 |
| 未播放 | 说话 | VAD判断,ASR识别 |
- 语音输入:通过 FunASR 进行准确的语音识别。
- 语音活动检测:使用 silero-vad 过滤无效音频,提升识别效率。
- 智能对话生成:依靠 deepseek 提供的强大语言理解能力生成自然的文本回复,极具性价比。
- 语音输出:通过 edge-tts Kokoro-82M 将文本转为语音,为用户提供逼真的听觉反馈。
- 支持打断:灵活配置打断策略,能够识别关键字和语音打断,确保用户在对话中的即时反馈与控制,提高交互流畅度。
- 支持记忆功能: 具备持续学习能力,能够记忆用户的偏好与历史对话,提供个性化的互动体验。
- 支持工具调用: 灵活集成外部工具,用户可通过语音直接请求信息或执行操作,提升助手的实用性。
- 支持任务管理: 高效管理用户任务,能够跟踪进度、设置提醒,并提供动态更新,确保用户不错过任何重要事项。
- 高质量语音对话:整合了优秀的ASR、LLM和TTS技术,确保语音对话的流畅性和准确性。
- 轻量化设计:无需高性能硬件即可运行,适用于资源受限的环境。
- 完全开源:百聆完全开源,鼓励社区贡献与二次开发。
请确保你的开发环境中安装了以下工具和库:
- Python 3.11 或更高版本
-
pip包管理器 - FunASR、silero-vad、deepseek、edge-tts Kokoro-82M 所需的依赖库
-
克隆项目仓库:
git clone https://github.com/wwbin2017/bailing.git cd bailing -
安装所需依赖:
pip install -r requirements.txt pip install -r third_party/OpenManus/requirements.txt
-
配置环境变量:
- 打开config/config.yaml 配置ASR LLM等相关配置
- 下载SenseVoiceSmall到目录models/SenseVoiceSmall SenseVoiceSmall下载地址
- 去deepseek官网,获取配置api_key,deepseek获取api_key,当然也可以配置openai、qwen、gemini、01yi等其他模型
- 如果需要使用通用AIGC配置(测试中),不可用的话,可以使用tag 分支 v0.0.1 v0.0.2
- /third_party/OpenManus/config/config.toml 需要配置里面的 model、base_url、api_key
-
运行项目:
cd server python server.py # 启动后端服务,也可不执行这一步
python main.py
- 启动应用后,系统会等待语音输入。
- 通过 FunASR 将用户语音转为文本。
- silero-vad 进行语音活动检测,确保只处理有效语音。
- deepseek 处理文本输入,并生成智能回复。
- edge-tts, Kokoro-82M, ChatTTS, MacOs say 将生成的文本转换为语音,并播放给用户。
- [x] 基本语音对话功能
- [x] 支持插件调用
- [x] 任务管理
- [x] Rag & Agent
- [x] Memory
- [ ] 支持语音唤醒
- [ ] 强化WebSearch
- [ ] 支持WebRTC
未来,百聆将升华为一款类JARVIS个人助手,仿佛一位贴心的智囊,具备无与伦比的记忆力与前瞻性的任务管理能力。依托于尖端的RAG与Agent技术,它将精确掌控您的事务与知识,化繁为简。只需轻声一语,例如“帮我查找最近新闻”或“总结大模型的最新进展”,百聆便会迅速响应,智能分析,实时跟踪,并将成果优雅地呈现给您。想象一下,您拥有的不仅是一名助手,而是一个深谙您需求的智慧伙伴,伴您在未来的每个重要瞬间,助您洞察万象,决胜千里。
| 函数名 | 描述 | 功能 | 示例 |
|---|---|---|---|
get_weather |
获取某个地点的天气信息 | 提供地点名称后,返回该地点的天气情况 | 用户说:“杭州天气怎么样?” → zhejiang/hangzhou
|
ielts_speaking_practice |
IELTS(雅思)口语练习 | 生成雅思口语练习题目和对话,帮助用户进行雅思口语练习 | - |
get_day_of_week |
获取当前的星期几或日期 | 当用户询问当前时间、日期或者星期几时,返回相应的信息 | 用户说:“今天星期几?” → 返回当前的星期几 |
schedule_task |
创建一个定时任务 | 用户可以指定任务的执行时间和内容,定时提醒用户 | 用户说:“每天早上8点提醒我喝水。” → time: '08:00', content: '提醒我喝水'
|
open_application |
在 Mac 电脑上打开指定的应用程序 | 用户可以指定应用程序的名称,脚本将在 Mac 上启动相应的应用 | 用户说:“打开Safari。” → application_name: 'Safari'
|
web_search |
在网上搜索指定的关键词 | 根据用户提供的搜索内容,返回相应的搜索结果 | 用户说:“搜索最新的科技新闻。” → query: '最新的科技新闻'
|
aigc_manus |
可以做任何事情通用型ai | 要执行的任务描述,返回任务执行的结果。 | 用户说:“分析特定股票的市场趋势” → query: '分析特定股票的市场趋势'
|
欢迎任何形式的贡献!如果你对百聆项目有改进建议或发现问题,请通过 GitHub Issues 进行反馈或提交 Pull Request。
该项目基于 MIT 许可证 开源。你可以自由地使用、修改和分发此项目,但需要保留原始许可证声明。
如有任何疑问或建议,请联系:
- GitHub Issues: 项目问题追踪
百聆 (Bailing) 是一个开源项目,旨在用于个人学习和研究目的。使用本项目时,请注意以下免责声明:
- 个人用途:本项目仅用于个人学习和研究,不适用于商业用途或生产环境。
- 风险和责任:使用百聆 (Bailing) 可能会导致数据丢失、系统故障或其他问题。我们对因使用本项目而导致的任何损失、损害或问题不承担任何责任。
- 支持:本项目不提供任何形式的技术支持或保证。用户应自行承担使用本项目的风险。
在使用本项目之前,请确保您已了解并接受这些免责声明。如果您不同意这些条款,请不要使用本项目。
感谢您的理解与支持!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for bailing
Similar Open Source Tools
bailing
Bailing is an open-source voice assistant designed for natural conversations with users. It combines Automatic Speech Recognition (ASR), Voice Activity Detection (VAD), Large Language Model (LLM), and Text-to-Speech (TTS) technologies to provide a high-quality voice interaction experience similar to GPT-4o. Bailing aims to achieve GPT-4o-like conversation effects without the need for GPU, making it suitable for various edge devices and low-resource environments. The project features efficient open-source models, modular design allowing for module replacement and upgrades, support for memory function, tool integration for information retrieval and task execution via voice commands, and efficient task management with progress tracking and reminders.
ChuanhuChatGPT
Chuanhu Chat is a user-friendly web graphical interface that provides various additional features for ChatGPT and other language models. It supports GPT-4, file-based question answering, local deployment of language models, online search, agent assistant, and fine-tuning. The tool offers a range of functionalities including auto-solving questions, online searching with network support, knowledge base for quick reading, local deployment of language models, GPT 3.5 fine-tuning, and custom model integration. It also features system prompts for effective role-playing, basic conversation capabilities with options to regenerate or delete dialogues, conversation history management with auto-saving and search functionalities, and a visually appealing user experience with themes, dark mode, LaTeX rendering, and PWA application support.
vscode-antigravity-cockpit
VS Code extension for monitoring Google Antigravity AI model quotas. It provides a webview dashboard, QuickPick mode, quota grouping, automatic grouping, renaming, card view, drag-and-drop sorting, status bar monitoring, threshold notifications, and privacy mode. Users can monitor quota status, remaining percentage, countdown, reset time, progress bar, and model capabilities. The extension supports local and authorized quota monitoring, multiple account authorization, and model wake-up scheduling. It also offers settings customization, user profile display, notifications, and group functionalities. Users can install the extension from the Open VSX Marketplace or via VSIX file. The source code can be built using Node.js and npm. The project is open-source under the MIT license.
MaiBot
MaiBot is an interactive intelligent agent based on a large language model. It aims to be an 'entity' active in QQ group chats, focusing on human-like interactions. It features personification in language style, behavior planning, expression learning, plugin system for unlimited extensions, and emotion expression. The project's design philosophy emphasizes creating a 'life form' in group chats that feels real rather than perfect, with the goal of providing companionship through an AI that makes mistakes and has its own perceptions and thoughts. The code is open-source, but the runtime data of MaiBot is intended to remain closed to maintain its autonomy and conversational nature.
Speech-AI-Forge
Speech-AI-Forge is a project developed around TTS generation models, implementing an API Server and a WebUI based on Gradio. The project offers various ways to experience and deploy Speech-AI-Forge, including online experience on HuggingFace Spaces, one-click launch on Colab, container deployment with Docker, and local deployment. The WebUI features include TTS model functionality, speaker switch for changing voices, style control, long text support with automatic text segmentation, refiner for ChatTTS native text refinement, various tools for voice control and enhancement, support for multiple TTS models, SSML synthesis control, podcast creation tools, voice creation, voice testing, ASR tools, and post-processing tools. The API Server can be launched separately for higher API throughput. The project roadmap includes support for various TTS models, ASR models, voice clone models, and enhancer models. Model downloads can be manually initiated using provided scripts. The project aims to provide inference services and may include training-related functionalities in the future.
HaE
HaE is a framework project in the field of network security (data security) that combines artificial intelligence (AI) large models to achieve highlighting and information extraction of HTTP messages (including WebSocket). It aims to reduce testing time, focus on valuable and meaningful messages, and improve vulnerability discovery efficiency. The project provides a clear and visual interface design, simple interface interaction, and centralized data panel for querying and extracting information. It also features built-in color upgrade algorithm, one-click export/import of data, and integration of AI large models API for optimized data processing.
chatless
Chatless is a modern AI chat desktop application built on Tauri and Next.js. It supports multiple AI providers, can connect to local Ollama models, supports document parsing and knowledge base functions. All data is stored locally to protect user privacy. The application is lightweight, simple, starts quickly, and consumes minimal resources.
Awesome-ChatTTS
Awesome-ChatTTS is an official recommended guide for ChatTTS beginners, compiling common questions and related resources. It provides a comprehensive overview of the project, including official introduction, quick experience options, popular branches, parameter explanations, voice seed details, installation guides, FAQs, and error troubleshooting. The repository also includes video tutorials, discussion community links, and project trends analysis. Users can explore various branches for different functionalities and enhancements related to ChatTTS.
XianyuAutoAgent
Xianyu AutoAgent is an AI customer service robot system specifically designed for the Xianyu platform, providing 24/7 automated customer service, supporting multi-expert collaborative decision-making, intelligent bargaining, and context-aware conversations. The system includes intelligent conversation engine with features like context awareness and expert routing, business function matrix with modules like core engine, bargaining system, technical support, and operation monitoring. It requires Python 3.8+ and NodeJS 18+ for installation and operation. Users can customize prompts for different experts and contribute to the project through issues or pull requests.
Unity-Skills
UnitySkills is an AI-driven Unity editor automation engine based on REST API. It allows AI to directly control Unity scenes through Skills. The tool offers extreme efficiency with Result Truncation and SKILL.md slimming, a versatile tool library with 282 Skills supporting Batch operations, ensuring transactional safety with automatic rollback, multiple instance support for controlling multiple Unity projects simultaneously, deep integration with Antigravity Slash Commands for interactive experience, compatibility with popular AI terminals like Claude Code, Antigravity, Gemini CLI, and support for Cinemachine 2.x/3.x dual versions with advanced camera control features like MixingCamera, ClearShot, TargetGroup, and Spline.
cockpit-tools
Cockpit Tools is a versatile AI IDE account management tool that supports Antigravity, Codex, GitHub Copilot, Windsurf, and Kiro. It allows efficient management of multiple AI IDE accounts with features like one-click switching, quota monitoring, automatic wake-up, and parallel running of multiple instances. The tool supports 16 languages and provides functionalities such as dashboard overview, account management for each supported platform, multiple instance management, quota monitoring, wake-up tasks, device fingerprinting, and plugin integration.
All-Model-Chat
All Model Chat is a feature-rich, highly customizable web chat application designed specifically for the Google Gemini API family. It integrates dynamic model selection, multimodal file input, streaming responses, comprehensive chat history management, and extensive customization options to provide an unparalleled AI interactive experience.
vocotype-cli
VocoType is a free desktop voice input method designed for professionals who value privacy and efficiency. All recognition is done locally, ensuring offline operation and no data upload. The CLI open-source version of the VocoType core engine on GitHub is mainly targeted at developers.
Con-Nav-Item
Con-Nav-Item is a modern personal navigation system designed for digital workers. It is not just a link bookmark but also an all-in-one workspace integrated with AI smart generation, multi-device synchronization, card-based management, and deep browser integration.
LLM-TPU
LLM-TPU project aims to deploy various open-source generative AI models on the BM1684X chip, with a focus on LLM. Models are converted to bmodel using TPU-MLIR compiler and deployed to PCIe or SoC environments using C++ code. The project has deployed various open-source models such as Baichuan2-7B, ChatGLM3-6B, CodeFuse-7B, DeepSeek-6.7B, Falcon-40B, Phi-3-mini-4k, Qwen-7B, Qwen-14B, Qwen-72B, Qwen1.5-0.5B, Qwen1.5-1.8B, Llama2-7B, Llama2-13B, LWM-Text-Chat, Mistral-7B-Instruct, Stable Diffusion, Stable Diffusion XL, WizardCoder-15B, Yi-6B-chat, Yi-34B-chat. Detailed model deployment information can be found in the 'models' subdirectory of the project. For demonstrations, users can follow the 'Quick Start' section. For inquiries about the chip, users can contact SOPHGO via the official website.
AI_NovelGenerator
AI_NovelGenerator is a versatile novel generation tool based on large language models. It features a novel setting workshop for world-building, character development, and plot blueprinting, intelligent chapter generation for coherent storytelling, a status tracking system for character arcs and foreshadowing management, a semantic retrieval engine for maintaining long-range context consistency, integration with knowledge bases for local document references, an automatic proofreading mechanism for detecting plot contradictions and logic conflicts, and a visual workspace for GUI operations encompassing configuration, generation, and proofreading. The tool aims to assist users in efficiently creating logically rigorous and thematically consistent long-form stories.
For similar tasks
M.I.L.E.S
M.I.L.E.S. (Machine Intelligent Language Enabled System) is a voice assistant powered by GPT-4 Turbo, offering a range of capabilities beyond existing assistants. With its advanced language understanding, M.I.L.E.S. provides accurate and efficient responses to user queries. It seamlessly integrates with smart home devices, Spotify, and offers real-time weather information. Additionally, M.I.L.E.S. possesses persistent memory, a built-in calculator, and multi-tasking abilities. Its realistic voice, accurate wake word detection, and internet browsing capabilities enhance the user experience. M.I.L.E.S. prioritizes user privacy by processing data locally, encrypting sensitive information, and adhering to strict data retention policies.
bailing
Bailing is an open-source voice assistant designed for natural conversations with users. It combines Automatic Speech Recognition (ASR), Voice Activity Detection (VAD), Large Language Model (LLM), and Text-to-Speech (TTS) technologies to provide a high-quality voice interaction experience similar to GPT-4o. Bailing aims to achieve GPT-4o-like conversation effects without the need for GPU, making it suitable for various edge devices and low-resource environments. The project features efficient open-source models, modular design allowing for module replacement and upgrades, support for memory function, tool integration for information retrieval and task execution via voice commands, and efficient task management with progress tracking and reminders.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.


