
py-xiaozhi
python版本的小智ai,主要帮助那些没有硬件却想体验小智功能的人
Stars: 554

py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.
README:
py-xiaozhi 是一个使用 Python 实现的小智语音客户端,旨在通过代码学习和在没有硬件条件下体验 AI 小智的语音功能。 本仓库是基于xiaozhi-esp32移植
- 仔细阅读/docs/使用文档.md 启动教程和文件说明都在里面了
- main是最新代码,每次更新都需要手动重新安装一次pip依赖防止我新增依赖后你们本地没有
- 目前在树莓派b4、b5、jetson等设备运行有问题
- mac的话 m1、m2等会出现找不到opuslib,目前有人提示说是python版本问题
- 跪求大佬加入!!!
- Python 3.9.13+(推荐 3.9.13)最大支持版本3.12
- Windows/Linux/macOS
- main 主分支
- feature/v1 第一个版本
- feature/visual 视觉分支
- 语音交互:支持语音输入与识别,实现智能人机交互。
- 图形化界面:提供直观易用的 GUI,方便用户操作。
- 音量控制:支持音量调节,适应不同环境需求。
- 会话管理:有效管理多轮对话,保持交互的连续性。
- 加密音频传输:保障音频数据的安全性,防止信息泄露。
- CLI 模式:支持命令行运行,适用于嵌入式设备或无 GUI 环境。
- 自动验证码处理:首次使用时,程序自动复制验证码并打开浏览器,简化用户操作。
- 唤醒词:支持语音唤醒,免去手动操作的烦恼。
- 键盘按键:监听可以最小化视口
+----------------+
| |
v |
+------+ 唤醒词/按钮 +------------+ | +------------+
| IDLE | -----------> | CONNECTING | --+-> | LISTENING |
+------+ +------------+ +------------+
^ |
| | 语音识别完成
| +------------+ v
+--------- | SPEAKING | <-----------------+
完成播放 +------------+
├── .github # GitHub 相关配置
│ └── ISSUE_TEMPLATE # Issue 模板目录
│ ├── bug_report.md # Bug 报告模板
│ ├── code_improvement.md # 代码改进建议模板
│ ├── documentation_improvement.md # 文档改进建议模板
│ └── feature_request.md # 功能请求模板
├── config # 配置文件目录
│ └── config.json # 应用程序配置文件
├── docs # 文档目录
│ ├── images # 文档图片资源
│ │ ├── QQ音乐接口配置.png # QQ音乐接口配置示例图
│ │ ├── 唤醒词.png # 唤醒词设置示例图
│ │ └── 群聊.jpg # 社区交流群图片
│ ├── 使用文档.md # 用户使用指南
│ └── 异常汇总.md # 常见错误及解决方案
├── libs # 依赖库目录
│ └── windows # Windows 平台特定库
│ └── opus.dll # Opus 音频编解码库
├── scripts # 实用脚本目录
│ ├── dir_tree.py # 生成目录树结构脚本
│ └── py_audio_scanner.py # 音频设备扫描工具
├── src # 源代码目录
│ ├── audio_codecs # 音频编解码模块
│ │ └── audio_codec.py # 音频编解码器实现
│ ├── audio_processing # 音频处理模块
│ │ ├── vad_detector.py # 语音活动检测实现(用于实时打断)
│ │ └── wake_word_detect.py # 语音唤醒词检测实现
│ ├── constants # 常量定义
│ │ └── constants.py # 应用程序常量(状态、事件类型等)
│ ├── display # 显示界面模块
│ │ ├── base_display.py # 显示界面基类
│ │ ├── cli_display.py # 命令行界面实现
│ │ └── gui_display.py # 图形用户界面实现
│ ├── iot # IoT设备相关模块
│ │ ├── things # 具体设备实现目录
│ │ │ ├── CameraVL # 摄像头与视觉识别模块
│ │ │ │ ├── Camera.py # 摄像头控制实现
│ │ │ │ └── VL.py # 视觉识别实现
│ │ │ ├── lamp.py # 智能灯具控制实现
│ │ │ ├── music_player.py # 音乐播放器实现
│ │ │ └── speaker.py # 智能音箱控制实现
│ │ ├── thing.py # IoT设备基类定义
│ │ └── thing_manager.py # IoT设备管理器(统一管理各类设备)
│ ├── protocols # 通信协议模块
│ │ ├── mqtt_protocol.py # MQTT 协议实现(用于设备通信)
│ │ ├── protocol.py # 协议基类
│ │ └── websocket_protocol.py # WebSocket 协议实现
│ ├── utils # 工具类模块
│ │ ├── config_manager.py # 配置管理器(单例模式)
│ │ ├── logging_config.py # 日志配置
│ │ ├── system_info.py # 系统信息工具(处理 opus.dll 加载等)
│ │ └── volume_controller.py # 音量控制工具(跨平台音量调节)
│ └── application.py # 应用程序主类(核心业务逻辑)
├── .gitignore # Git 忽略文件配置
├── LICENSE # 项目许可证
├── README.md # 项目说明文档
├── main.py # 程序入口点
├── requirements.txt # Python 依赖包列表(通用)
├── requirements_mac.txt # macOS 特定依赖包列表
└── xiaozhi.spec # PyInstaller 打包配置文件
- [x] 新增 GUI 页面,无需在控制台一直按空格
- [x] 代码模块化,拆分代码并封装为类,职责分明
- [x] 音量调节,可手动调整音量大小
- [x] 自动获取 MAC 地址,避免 MAC 地址冲突
- [x] 支持 WSS 协议,提升安全性和兼容性
- [x] GUI 新增小智表情与文本显示,增强交互体验
- [x] 新增命令行操控方案,适用于 Linux 嵌入式设备
- [x] 自动对话模式,实现更自然的交互
- [x] 语音唤醒,支持唤醒词激活交互 (默认关闭需要手动开启)
- [x] IoT 设备集成,实现更多物联网功能
- [x] 联网音乐播放
- [x] 新增 Volume控制类统一声音改变
- [x] 新增 视觉多模态
- [x] 修复 goodbye 后无法重连 的问题
- [x] 解决 macOS 和 Linux 运行异常(原先使用 pycaw 处理音量导致)
- [x] 优化“按住说话”按钮,使其更明显
- [x] 修复 Stream not open 错误(目前 Windows 不再触发,其他系统待确认)
- [x] 修复 没有找到该设备的版本信息,请正确配置 OTA 地址提示
- [x] 修复 cli模式update_volume缺失问题
- [ ] 新 GUI(Electron),提供更现代的用户界面
欢迎提交 Issues 和 Pull Requests!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for py-xiaozhi
Similar Open Source Tools

py-xiaozhi
py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.

MaiMBot
MaiMBot is an intelligent QQ group chat bot based on a large language model. It is developed using the nonebot2 framework, utilizes LLM for conversation abilities, MongoDB for data persistence, and NapCat for QQ protocol support. The bot features keyword-triggered proactive responses, dynamic prompt construction, support for images and message forwarding, typo generation, multiple replies, emotion-based emoji responses, daily schedule generation, user relationship management, knowledge base, and group impressions. Work-in-progress features include personality, group atmosphere, image handling, humor, meme functions, and Minecraft interactions. The tool is in active development with plans for GIF compatibility, mini-program link parsing, bug fixes, documentation improvements, and logic enhancements for emoji sending.

MaiBot
MaiBot is an intelligent QQ group chat bot based on a large language model. It is developed using the nonebot2 framework, with LLM providing conversation abilities, MongoDB for data persistence support, and NapCat as the QQ protocol endpoint support. The project is in active development stage, with features like chat functionality, emoji functionality, schedule management, memory function, knowledge base function, and relationship function planned for future updates. The project aims to create a 'life form' active in QQ group chats, focusing on companionship and creating a more human-like presence rather than a perfect assistant. The application generates content from AI models, so users are advised to discern carefully and not use it for illegal purposes.

godoos
GodoOS is an efficient intranet office operating system that includes various office tools such as word/excel/ppt/pdf/internal chat/whiteboard/mind map, with native file storage support. The platform interface mimics the Windows style, making it easy to operate while maintaining low resource consumption and high performance. It automatically connects to intranet users without registration, enabling instant communication and file sharing. The flexible and highly configurable app store allows for unlimited expansion.

KouriChat
KouriChat is a project that seamlessly integrates virtual and real interactions, providing eternal gentle bonds. It offers features like WeChat integration, immersive role-playing, intelligent conversation segmentation, emotion-based emojis, image generation, image recognition, voice messages, and more. The project is focused on technical research and learning exchanges, with a strong emphasis on ethical and legal guidelines. Users are required to take full responsibility for their actions, especially minors who should use the tool under supervision. The project architecture includes avatar configurations, data storage, handlers, AI service interfaces, a web UI, and utility libraries.

AHU-AI-Repository
This repository is dedicated to the learning and exchange of resources for the School of Artificial Intelligence at Anhui University. Notes will be published on this website first: https://www.aoaoaoao.cn and will be synchronized to the repository regularly. You can also contact me at [email protected].

vpnfast.github.io
VPNFast is a lightweight and fast VPN service provider that offers secure and private internet access. With VPNFast, users can protect their online privacy, bypass geo-restrictions, and secure their internet connection from hackers and snoopers. The service provides high-speed servers in multiple locations worldwide, ensuring a reliable and seamless VPN experience for users. VPNFast is easy to use, with a user-friendly interface and simple setup process. Whether you're browsing the web, streaming content, or accessing sensitive information, VPNFast helps you stay safe and anonymous online.

gez
Gez is a high-performance micro frontend framework based on ESM. It uses Rspack compilation and maps modules to URLs with strong caching and content-based hashing. Gez embraces modern micro frontend architecture by leveraging ESM and importmap for dependency management, providing reliable isolation with module scope, seamless integration with any modern frontend framework, intuitive development experience, and optimal performance with zero runtime overhead and reliable caching strategies.

AI-Drug-Discovery-Design
AI-Drug-Discovery-Design is a repository focused on Artificial Intelligence-assisted Drug Discovery and Design. It explores the use of AI technology to accelerate and optimize the drug development process. The advantages of AI in drug design include speeding up research cycles, improving accuracy through data-driven models, reducing costs by minimizing experimental redundancies, and enabling personalized drug design for specific patients or disease characteristics.

MoneyPrinterPlus
MoneyPrinterPlus is a project designed to help users easily make money in the era of short videos. It leverages AI big model technology to batch generate various short videos, perform video editing, and automatically publish videos to popular platforms like Douyin, Kuaishou, Xiaohongshu, and Video Number. The tool covers a wide range of functionalities including integrating with major AI big model tools, supporting various voice types, offering video transition effects, enabling customization of subtitles, and more. It aims to simplify the process of creating and sharing videos to monetize traffic.

KubeDoor
KubeDoor is a microservice resource management platform developed using Python and Vue, based on K8S admission control mechanism. It supports unified remote storage, monitoring, alerting, notification, and display for multiple K8S clusters. The platform focuses on resource analysis and control during daily peak hours of microservices, ensuring consistency between resource request rate and actual usage rate.

JeecgBoot
JeecgBoot is a Java AI Low Code Platform for Enterprise web applications, based on BPM and code generator. It features a SpringBoot2.x/3.x backend, SpringCloud, Ant Design Vue3, Mybatis-plus, Shiro, JWT, supporting microservices, multi-tenancy, and AI capabilities like DeepSeek and ChatGPT. The powerful code generator allows for one-click generation of frontend and backend code without writing any code. JeecgBoot leads the way in AI low-code development mode, helping to solve 80% of repetitive work in Java projects and allowing developers to focus more on business logic.
For similar tasks

py-xiaozhi
py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.

Ai-Hoshino
Ai Hoshino - MD is a WhatsApp bot tool with features like voice and text interaction, group configuration, anti-delete, anti-link, personalized welcome messages, chatbot functionality, sticker creation, sub-bot integration, RPG game, YouTube music and video downloads, and more. The tool is actively maintained by Starlights Team and offers a range of functionalities for WhatsApp users.

aiohttp-session
aiohttp_session is a Python library that provides session management for aiohttp.web applications. It allows storing user-specific data in session objects with a dict-like interface. The library offers different session storage options, including SimpleCookieStorage for testing, EncryptedCookieStorage for secure data storage, and RedisStorage for storing data in Redis. Users can easily integrate session management into their aiohttp.web applications by registering the session middleware. The library is designed to simplify session handling and enhance the security of web applications.

chatgpt-wechat
ChatGPT-WeChat is a personal assistant application that can be safely used on WeChat through enterprise WeChat without the risk of being banned. The project is open source and free, with no paid sections or external traffic operations except for advertising on the author's public account '积木成楼'. It supports various features such as secure usage on WeChat, multi-channel customer service message integration, proxy support, session management, rapid message response, voice and image messaging, drawing capabilities, private data storage, plugin support, and more. Users can also develop their own capabilities following the rules provided. The project is currently in development with stable versions available for use.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.