py-xiaozhi

python版本的小智ai，主要帮助那些没有硬件却想体验小智功能的人

Stars: 606

Visit

README:

py-xiaozhi

简体中文 | English

项目简介

py-xiaozhi 是一个使用 Python 实现的小智语音客户端，旨在通过代码学习和在没有硬件条件下体验 AI 小智的语音功能。本仓库是基于xiaozhi-esp32移植

演示

Bilibili 演示视频

功能特点

AI语音交互：支持语音输入与识别，实现智能人机交互，提供自然流畅的对话体验。
视觉多模态：支持图像识别和处理，提供多模态交互能力，理解图像内容。
IoT 设备集成：支持智能家居设备控制，实现更多物联网功能，打造智能家居生态。
联网音乐播放：支持在线音乐搜索和播放，享受海量音乐资源。
语音唤醒：支持唤醒词激活交互，免去手动操作的烦恼（默认关闭需要手动开启）。
自动对话模式：实现连续对话体验，提升用户交互流畅度。
图形化界面：提供直观易用的 GUI，支持小智表情与文本显示，增强视觉体验。
命令行模式：支持 CLI 运行，适用于嵌入式设备或无 GUI 环境。
跨平台支持：兼容 Windows 10+、macOS 10.15+ 和 Linux 系统，随时随地使用。
音量控制：支持音量调节，适应不同环境需求，统一声音控制接口。
会话管理：有效管理多轮对话，保持交互的连续性。
加密音频传输：支持 WSS 协议，保障音频数据的安全性，防止信息泄露。
自动验证码处理：首次使用时，程序自动复制验证码并打开浏览器，简化用户操作。
自动获取 MAC 地址：避免 MAC 地址冲突，提高连接稳定性。
代码模块化：拆分代码并封装为类，职责分明，便于二次开发。
稳定性优化：修复多项问题，包括断线重连、跨平台兼容等。

系统要求

3.9 >= Python版本 <= 3.12
支持的操作系统：Windows 10+、macOS 10.15+、Linux
麦克风和扬声器设备

请先看这里！

仔细阅读/docs/使用文档.md 启动教程和文件说明都在里面了
main是最新代码，每次更新都需要手动重新安装一次pip依赖防止我新增依赖后你们本地没有

从零开始使用小智客户端（视频教程）

状态流转图

                        +----------------+
                        |                |
                        v                |
+------+  唤醒词/按钮  +------------+   |   +------------+
| IDLE | -----------> | CONNECTING | --+-> | LISTENING  |
+------+              +------------+       +------------+
   ^                                            |
   |                                            | 语音识别完成
   |          +------------+                    v
   +--------- |  SPEAKING  | <-----------------+
     完成播放 +------------+

待实现功能

[ ] 新 GUI（Electron）：提供更现代、美观的用户界面，优化交互体验。

常见问题

找不到音频设备：请检查麦克风和扬声器是否正常连接和启用。
唤醒词不响应：请检查config.json中的USE_WAKE_WORD设置是否为true，以及模型路径是否正确。
网络连接失败：请检查网络设置和防火墙配置，确保WebSocket或MQTT通信未被阻止。
打包失败：确保已安装PyInstaller (pip install pyinstaller)，并且所有依赖项都已安装。然后重新执行python scripts/build.py

项目结构

├── .github                          # GitHub 相关配置
│   └── ISSUE_TEMPLATE               # Issue 模板目录
│       ├── bug_report.md            # Bug 报告模板
│       ├── code_improvement.md      # 代码改进建议模板
│       ├── documentation_improvement.md  # 文档改进建议模板
│       └── feature_request.md       # 功能请求模板
├── config                           # 配置文件目录
│   ├── camera_VL_config.json        # 摄像头与视觉识别配置
│   └── config.json                  # 应用程序配置文件
├── docs                             # 文档目录
│   ├── images                       # 文档图片资源
│   │   ├── 唤醒词.png               # 唤醒词设置示例图
│   │   └── 群聊.jpg                 # 社区交流群图片
│   ├── 使用文档.md                  # 用户使用指南
│   └── 异常汇总.md                  # 常见错误及解决方案
├── hooks                            # PyInstaller钩子目录
│   ├── hook-opuslib.py              # opuslib钩子
│   ├── hook-vosk.py                 # vosk钩子
│   └── runtime_hook.py              # 运行时钩子
├── libs                             # 依赖库目录
│   └── windows                      # Windows 平台特定库
│       └── opus.dll                 # Opus 音频编解码库
├── resources                        # 资源文件目录
├── scripts                          # 实用脚本目录
│   ├── build.py                     # 打包构建脚本
│   ├── dir_tree.py                  # 生成目录树结构脚本
│   └── py_audio_scanner.py          # 音频设备扫描工具
├── src                              # 源代码目录
│   ├── audio_codecs                 # 音频编解码模块
│   │   └── audio_codec.py           # 音频编解码器实现
│   ├── audio_processing             # 音频处理模块
│   │   ├── vad_detector.py          # 语音活动检测实现（用于实时打断）
│   │   └── wake_word_detect.py      # 语音唤醒词检测实现
│   ├── constants                    # 常量定义
│   │   └── constants.py             # 应用程序常量（状态、事件类型等）
│   ├── display                      # 显示界面模块
│   │   ├── base_display.py          # 显示界面基类
│   │   ├── cli_display.py           # 命令行界面实现
│   │   └── gui_display.py           # 图形用户界面实现
│   ├── iot                          # IoT设备相关模块
│   │   ├── things                   # 具体设备实现目录
│   │   │   ├── CameraVL             # 摄像头与视觉识别模块
│   │   │   │   ├── Camera.py        # 摄像头控制实现
│   │   │   │   └── VL.py            # 视觉识别实现
│   │   │   ├── lamp.py              # 智能灯具控制实现
│   │   │   ├── music_player.py      # 音乐播放器实现
│   │   │   ├── query_bridge_rag.py  # RAG查询桥接实现
│   │   │   └── speaker.py           # 智能音箱控制实现
│   │   ├── thing.py                 # IoT设备基类定义
│   │   └── thing_manager.py         # IoT设备管理器（统一管理各类设备）
│   ├── protocols                    # 通信协议模块
│   │   ├── mqtt_protocol.py         # MQTT 协议实现（用于设备通信）
│   │   ├── protocol.py              # 协议基类
│   │   └── websocket_protocol.py    # WebSocket 协议实现
│   ├── utils                        # 工具类模块
│   │   ├── config_manager.py        # 配置管理器（单例模式）
│   │   ├── logging_config.py        # 日志配置
│   │   ├── system_info.py           # 系统信息工具（处理 opus.dll 加载等）
│   │   └── volume_controller.py     # 音量控制工具（跨平台音量调节）
│   └── application.py               # 应用程序主类（核心业务逻辑）
├── .gitignore                       # Git 忽略文件配置
├── LICENSE                          # 项目许可证
├── README.md                        # 项目说明文档
├── main.py                          # 程序入口点
├── requirements.txt                 # Python 依赖包列表（通用）
├── requirements_mac.txt             # macOS 特定依赖包列表

贡献指南

欢迎提交问题报告和代码贡献。请确保遵循以下规范：

代码风格符合PEP8规范
提交的PR包含适当的测试
更新相关文档

社区与支持

感谢以下开源人员

排名不分前后

Xiaoxia zhh827 四博智联-李洪刚 HonestQiao vonweller 孙卫公 isamu2025 Rain120 kejily 电波bilibili君

赞助支持

感谢所有赞助者的支持 ❤️

无论是接口资源、设备兼容测试还是资金支持，每一份帮助都让项目更加完善

项目统计

许可证

MIT License

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for py-xiaozhi

Similar Open Source Tools

py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.

github

: 554

Snap-Solver

github

: 74

DocTranslator

github

: 60

AI-Sphere-Butler

github

: 68

LLMAI-writer

github

: 65

godoos

GodoOS is an efficient intranet office operating system that includes various office tools such as word/excel/ppt/pdf/internal chat/whiteboard/mind map, with native file storage support. The platform interface mimics the Windows style, making it easy to operate while maintaining low resource consumption and high performance. It automatically connects to intranet users without registration, enabling instant communication and file sharing. The flexible and highly configurable app store allows for unlimited expansion.

github

: 151

AHU-AI-Repository

This repository is dedicated to the learning and exchange of resources for the School of Artificial Intelligence at Anhui University. Notes will be published on this website first: https://www.aoaoaoao.cn and will be synchronized to the repository regularly. You can also contact me at [email protected].

github

: 197

xiaoniu

github

: 66

MaiMBot

MaiMBot is an intelligent QQ group chat bot based on a large language model. It is developed using the nonebot2 framework, utilizes LLM for conversation abilities, MongoDB for data persistence, and NapCat for QQ protocol support. The bot features keyword-triggered proactive responses, dynamic prompt construction, support for images and message forwarding, typo generation, multiple replies, emotion-based emoji responses, daily schedule generation, user relationship management, knowledge base, and group impressions. Work-in-progress features include personality, group atmosphere, image handling, humor, meme functions, and Minecraft interactions. The tool is in active development with plans for GIF compatibility, mini-program link parsing, bug fixes, documentation improvements, and logic enhancements for emoji sending.

github

: 1.1k

JeecgBoot

JeecgBoot is a Java AI Low Code Platform for Enterprise web applications, based on BPM and code generator. It features a SpringBoot2.x/3.x backend, SpringCloud, Ant Design Vue3, Mybatis-plus, Shiro, JWT, supporting microservices, multi-tenancy, and AI capabilities like DeepSeek and ChatGPT. The powerful code generator allows for one-click generation of frontend and backend code without writing any code. JeecgBoot leads the way in AI low-code development mode, helping to solve 80% of repetitive work in Java projects and allowing developers to focus more on business logic.

github

: 42.1k

vpnfast.github.io

VPNFast is a lightweight and fast VPN service provider that offers secure and private internet access. With VPNFast, users can protect their online privacy, bypass geo-restrictions, and secure their internet connection from hackers and snoopers. The service provides high-speed servers in multiple locations worldwide, ensuring a reliable and seamless VPN experience for users. VPNFast is easy to use, with a user-friendly interface and simple setup process. Whether you're browsing the web, streaming content, or accessing sensitive information, VPNFast helps you stay safe and anonymous online.

github

: 80

gez

Gez is a high-performance micro frontend framework based on ESM. It uses Rspack compilation and maps modules to URLs with strong caching and content-based hashing. Gez embraces modern micro frontend architecture by leveraging ESM and importmap for dependency management, providing reliable isolation with module scope, seamless integration with any modern frontend framework, intuitive development experience, and optimal performance with zero runtime overhead and reliable caching strategies.

github

: 584

aimoneyhunter

AiMoneyHunter is a comprehensive collection of information on AI side hustle opportunities, covering various methods, technologies, tools, platforms, and channels for making money with AI. It aims to break information barriers in the AI era, enabling everyone to leverage AI intelligence for side hustles and earn extra income. The repository includes curated AI-related content sources, tips on starting a side hustle, and insights on using AI technologies for various money-making tasks.

github

: 13.2k

llm-action

This repository provides a comprehensive guide to large language models (LLMs), covering various aspects such as training, fine-tuning, compression, and applications. It includes detailed tutorials, code examples, and explanations of key concepts and techniques. The repository is maintained by Liguo Dong, an AI researcher and engineer with expertise in LLM research and development.

github

: 12.9k

ai_quant_trade

The ai_quant_trade repository is a comprehensive platform for stock AI trading, offering learning, simulation, and live trading capabilities. It includes features such as factor mining, traditional strategies, machine learning, deep learning, reinforcement learning, graph networks, and high-frequency trading. The repository provides tools for monitoring stocks, stock recommendations, and deployment tools for live trading. It also features new functionalities like sentiment analysis using StructBERT, reinforcement learning for multi-stock trading with a 53% annual return, automatic factor mining with 5000 factors, customized stock monitoring software, and local deep reinforcement learning strategies.

github

: 2.6k

For similar tasks

No tools available

For similar jobs

No tools available

py-xiaozhi

README:

py-xiaozhi

项目简介

演示

功能特点

系统要求

请先看这里！

状态流转图

待实现功能

常见问题

相关第三方开源项目

相关分支

项目结构

贡献指南

社区与支持

感谢以下开源人员

赞助支持

感谢所有赞助者的支持 ❤️

项目统计

许可证

For Tasks:

For Jobs:

Alternative AI tools for py-xiaozhi

Similar Open Source Tools

py-xiaozhi

py-xiaozhi

Snap-Solver

DocTranslator

AI-Sphere-Butler

LLMAI-writer

godoos

AHU-AI-Repository

xiaoniu

MaiMBot

JeecgBoot

vpnfast.github.io

gez

aimoneyhunter

llm-action

ai_quant_trade

For similar tasks

For similar jobs