
py-xiaozhi
python版本的小智ai,主要帮助那些没有硬件却想体验小智功能的人
Stars: 606

README:
简体中文 | English
py-xiaozhi 是一个使用 Python 实现的小智语音客户端,旨在通过代码学习和在没有硬件条件下体验 AI 小智的语音功能。 本仓库是基于xiaozhi-esp32移植
- AI语音交互:支持语音输入与识别,实现智能人机交互,提供自然流畅的对话体验。
- 视觉多模态:支持图像识别和处理,提供多模态交互能力,理解图像内容。
- IoT 设备集成:支持智能家居设备控制,实现更多物联网功能,打造智能家居生态。
- 联网音乐播放:支持在线音乐搜索和播放,享受海量音乐资源。
- 语音唤醒:支持唤醒词激活交互,免去手动操作的烦恼(默认关闭需要手动开启)。
- 自动对话模式:实现连续对话体验,提升用户交互流畅度。
- 图形化界面:提供直观易用的 GUI,支持小智表情与文本显示,增强视觉体验。
- 命令行模式:支持 CLI 运行,适用于嵌入式设备或无 GUI 环境。
- 跨平台支持:兼容 Windows 10+、macOS 10.15+ 和 Linux 系统,随时随地使用。
- 音量控制:支持音量调节,适应不同环境需求,统一声音控制接口。
- 会话管理:有效管理多轮对话,保持交互的连续性。
- 加密音频传输:支持 WSS 协议,保障音频数据的安全性,防止信息泄露。
- 自动验证码处理:首次使用时,程序自动复制验证码并打开浏览器,简化用户操作。
- 自动获取 MAC 地址:避免 MAC 地址冲突,提高连接稳定性。
- 代码模块化:拆分代码并封装为类,职责分明,便于二次开发。
- 稳定性优化:修复多项问题,包括断线重连、跨平台兼容等。
- 3.9 >= Python版本 <= 3.12
- 支持的操作系统:Windows 10+、macOS 10.15+、Linux
- 麦克风和扬声器设备
- 仔细阅读/docs/使用文档.md 启动教程和文件说明都在里面了
- main是最新代码,每次更新都需要手动重新安装一次pip依赖防止我新增依赖后你们本地没有
+----------------+
| |
v |
+------+ 唤醒词/按钮 +------------+ | +------------+
| IDLE | -----------> | CONNECTING | --+-> | LISTENING |
+------+ +------------+ +------------+
^ |
| | 语音识别完成
| +------------+ v
+--------- | SPEAKING | <-----------------+
完成播放 +------------+
- [ ] 新 GUI(Electron):提供更现代、美观的用户界面,优化交互体验。
- 找不到音频设备:请检查麦克风和扬声器是否正常连接和启用。
-
唤醒词不响应:请检查
config.json
中的USE_WAKE_WORD
设置是否为true
,以及模型路径是否正确。 - 网络连接失败:请检查网络设置和防火墙配置,确保WebSocket或MQTT通信未被阻止。
-
打包失败:确保已安装PyInstaller (
pip install pyinstaller
),并且所有依赖项都已安装。然后重新执行python scripts/build.py
- main 主分支
- feature/v1 第一个版本
- feature/visual 视觉分支
├── .github # GitHub 相关配置
│ └── ISSUE_TEMPLATE # Issue 模板目录
│ ├── bug_report.md # Bug 报告模板
│ ├── code_improvement.md # 代码改进建议模板
│ ├── documentation_improvement.md # 文档改进建议模板
│ └── feature_request.md # 功能请求模板
├── config # 配置文件目录
│ ├── camera_VL_config.json # 摄像头与视觉识别配置
│ └── config.json # 应用程序配置文件
├── docs # 文档目录
│ ├── images # 文档图片资源
│ │ ├── 唤醒词.png # 唤醒词设置示例图
│ │ └── 群聊.jpg # 社区交流群图片
│ ├── 使用文档.md # 用户使用指南
│ └── 异常汇总.md # 常见错误及解决方案
├── hooks # PyInstaller钩子目录
│ ├── hook-opuslib.py # opuslib钩子
│ ├── hook-vosk.py # vosk钩子
│ └── runtime_hook.py # 运行时钩子
├── libs # 依赖库目录
│ └── windows # Windows 平台特定库
│ └── opus.dll # Opus 音频编解码库
├── resources # 资源文件目录
├── scripts # 实用脚本目录
│ ├── build.py # 打包构建脚本
│ ├── dir_tree.py # 生成目录树结构脚本
│ └── py_audio_scanner.py # 音频设备扫描工具
├── src # 源代码目录
│ ├── audio_codecs # 音频编解码模块
│ │ └── audio_codec.py # 音频编解码器实现
│ ├── audio_processing # 音频处理模块
│ │ ├── vad_detector.py # 语音活动检测实现(用于实时打断)
│ │ └── wake_word_detect.py # 语音唤醒词检测实现
│ ├── constants # 常量定义
│ │ └── constants.py # 应用程序常量(状态、事件类型等)
│ ├── display # 显示界面模块
│ │ ├── base_display.py # 显示界面基类
│ │ ├── cli_display.py # 命令行界面实现
│ │ └── gui_display.py # 图形用户界面实现
│ ├── iot # IoT设备相关模块
│ │ ├── things # 具体设备实现目录
│ │ │ ├── CameraVL # 摄像头与视觉识别模块
│ │ │ │ ├── Camera.py # 摄像头控制实现
│ │ │ │ └── VL.py # 视觉识别实现
│ │ │ ├── lamp.py # 智能灯具控制实现
│ │ │ ├── music_player.py # 音乐播放器实现
│ │ │ ├── query_bridge_rag.py # RAG查询桥接实现
│ │ │ └── speaker.py # 智能音箱控制实现
│ │ ├── thing.py # IoT设备基类定义
│ │ └── thing_manager.py # IoT设备管理器(统一管理各类设备)
│ ├── protocols # 通信协议模块
│ │ ├── mqtt_protocol.py # MQTT 协议实现(用于设备通信)
│ │ ├── protocol.py # 协议基类
│ │ └── websocket_protocol.py # WebSocket 协议实现
│ ├── utils # 工具类模块
│ │ ├── config_manager.py # 配置管理器(单例模式)
│ │ ├── logging_config.py # 日志配置
│ │ ├── system_info.py # 系统信息工具(处理 opus.dll 加载等)
│ │ └── volume_controller.py # 音量控制工具(跨平台音量调节)
│ └── application.py # 应用程序主类(核心业务逻辑)
├── .gitignore # Git 忽略文件配置
├── LICENSE # 项目许可证
├── README.md # 项目说明文档
├── main.py # 程序入口点
├── requirements.txt # Python 依赖包列表(通用)
├── requirements_mac.txt # macOS 特定依赖包列表
欢迎提交问题报告和代码贡献。请确保遵循以下规范:
- 代码风格符合PEP8规范
- 提交的PR包含适当的测试
- 更新相关文档
排名不分前后
Xiaoxia zhh827 四博智联-李洪刚 HonestQiao vonweller 孙卫公 isamu2025 Rain120 kejily 电波bilibili君
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for py-xiaozhi
Similar Open Source Tools

py-xiaozhi
py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.

godoos
GodoOS is an efficient intranet office operating system that includes various office tools such as word/excel/ppt/pdf/internal chat/whiteboard/mind map, with native file storage support. The platform interface mimics the Windows style, making it easy to operate while maintaining low resource consumption and high performance. It automatically connects to intranet users without registration, enabling instant communication and file sharing. The flexible and highly configurable app store allows for unlimited expansion.

AHU-AI-Repository
This repository is dedicated to the learning and exchange of resources for the School of Artificial Intelligence at Anhui University. Notes will be published on this website first: https://www.aoaoaoao.cn and will be synchronized to the repository regularly. You can also contact me at [email protected].

JeecgBoot
JeecgBoot is a Java AI Low Code Platform for Enterprise web applications, based on BPM and code generator. It features a SpringBoot2.x/3.x backend, SpringCloud, Ant Design Vue3, Mybatis-plus, Shiro, JWT, supporting microservices, multi-tenancy, and AI capabilities like DeepSeek and ChatGPT. The powerful code generator allows for one-click generation of frontend and backend code without writing any code. JeecgBoot leads the way in AI low-code development mode, helping to solve 80% of repetitive work in Java projects and allowing developers to focus more on business logic.

MaiMBot
MaiMBot is an intelligent QQ group chat bot based on a large language model. It is developed using the nonebot2 framework, utilizes LLM for conversation abilities, MongoDB for data persistence, and NapCat for QQ protocol support. The bot features keyword-triggered proactive responses, dynamic prompt construction, support for images and message forwarding, typo generation, multiple replies, emotion-based emoji responses, daily schedule generation, user relationship management, knowledge base, and group impressions. Work-in-progress features include personality, group atmosphere, image handling, humor, meme functions, and Minecraft interactions. The tool is in active development with plans for GIF compatibility, mini-program link parsing, bug fixes, documentation improvements, and logic enhancements for emoji sending.

vpnfast.github.io
VPNFast is a lightweight and fast VPN service provider that offers secure and private internet access. With VPNFast, users can protect their online privacy, bypass geo-restrictions, and secure their internet connection from hackers and snoopers. The service provides high-speed servers in multiple locations worldwide, ensuring a reliable and seamless VPN experience for users. VPNFast is easy to use, with a user-friendly interface and simple setup process. Whether you're browsing the web, streaming content, or accessing sensitive information, VPNFast helps you stay safe and anonymous online.

MaiBot
MaiBot is an intelligent QQ group chat bot based on a large language model. It is developed using the nonebot2 framework, with LLM providing conversation abilities, MongoDB for data persistence support, and NapCat as the QQ protocol endpoint support. The project is in active development stage, with features like chat functionality, emoji functionality, schedule management, memory function, knowledge base function, and relationship function planned for future updates. The project aims to create a 'life form' active in QQ group chats, focusing on companionship and creating a more human-like presence rather than a perfect assistant. The application generates content from AI models, so users are advised to discern carefully and not use it for illegal purposes.

gez
Gez is a high-performance micro frontend framework based on ESM. It uses Rspack compilation and maps modules to URLs with strong caching and content-based hashing. Gez embraces modern micro frontend architecture by leveraging ESM and importmap for dependency management, providing reliable isolation with module scope, seamless integration with any modern frontend framework, intuitive development experience, and optimal performance with zero runtime overhead and reliable caching strategies.

KouriChat
KouriChat is a project that seamlessly integrates virtual and real interactions, providing eternal gentle bonds. It offers features like WeChat integration, immersive role-playing, intelligent conversation segmentation, emotion-based emojis, image generation, image recognition, voice messages, and more. The project is focused on technical research and learning exchanges, with a strong emphasis on ethical and legal guidelines. Users are required to take full responsibility for their actions, especially minors who should use the tool under supervision. The project architecture includes avatar configurations, data storage, handlers, AI service interfaces, a web UI, and utility libraries.

aimoneyhunter
AiMoneyHunter is a comprehensive collection of information on AI side hustle opportunities, covering various methods, technologies, tools, platforms, and channels for making money with AI. It aims to break information barriers in the AI era, enabling everyone to leverage AI intelligence for side hustles and earn extra income. The repository includes curated AI-related content sources, tips on starting a side hustle, and insights on using AI technologies for various money-making tasks.

llm-action
This repository provides a comprehensive guide to large language models (LLMs), covering various aspects such as training, fine-tuning, compression, and applications. It includes detailed tutorials, code examples, and explanations of key concepts and techniques. The repository is maintained by Liguo Dong, an AI researcher and engineer with expertise in LLM research and development.

ai_quant_trade
The ai_quant_trade repository is a comprehensive platform for stock AI trading, offering learning, simulation, and live trading capabilities. It includes features such as factor mining, traditional strategies, machine learning, deep learning, reinforcement learning, graph networks, and high-frequency trading. The repository provides tools for monitoring stocks, stock recommendations, and deployment tools for live trading. It also features new functionalities like sentiment analysis using StructBERT, reinforcement learning for multi-stock trading with a 53% annual return, automatic factor mining with 5000 factors, customized stock monitoring software, and local deep reinforcement learning strategies.

AI-Drug-Discovery-Design
AI-Drug-Discovery-Design is a repository focused on Artificial Intelligence-assisted Drug Discovery and Design. It explores the use of AI technology to accelerate and optimize the drug development process. The advantages of AI in drug design include speeding up research cycles, improving accuracy through data-driven models, reducing costs by minimizing experimental redundancies, and enabling personalized drug design for specific patients or disease characteristics.