
py-xiaozhi
python版本的小智ai,主要帮助那些没有硬件却想体验小智功能的人
Stars: 554

py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.
README:
py-xiaozhi 是一个使用 Python 实现的小智语音客户端,旨在通过代码学习和在没有硬件条件下体验 AI 小智的语音功能。 本仓库是基于xiaozhi-esp32移植
- 仔细阅读/docs/使用文档.md 启动教程和文件说明都在里面了
- main是最新代码,每次更新都需要手动重新安装一次pip依赖防止我新增依赖后你们本地没有
- 目前在树莓派b4、b5、jetson等设备运行有问题
- mac的话 m1、m2等会出现找不到opuslib,目前有人提示说是python版本问题
- 跪求大佬加入!!!
- Python 3.9.13+(推荐 3.9.13)最大支持版本3.12
- Windows/Linux/macOS
- main 主分支
- feature/v1 第一个版本
- feature/visual 视觉分支
- 语音交互:支持语音输入与识别,实现智能人机交互。
- 图形化界面:提供直观易用的 GUI,方便用户操作。
- 音量控制:支持音量调节,适应不同环境需求。
- 会话管理:有效管理多轮对话,保持交互的连续性。
- 加密音频传输:保障音频数据的安全性,防止信息泄露。
- CLI 模式:支持命令行运行,适用于嵌入式设备或无 GUI 环境。
- 自动验证码处理:首次使用时,程序自动复制验证码并打开浏览器,简化用户操作。
- 唤醒词:支持语音唤醒,免去手动操作的烦恼。
- 键盘按键:监听可以最小化视口
+----------------+
| |
v |
+------+ 唤醒词/按钮 +------------+ | +------------+
| IDLE | -----------> | CONNECTING | --+-> | LISTENING |
+------+ +------------+ +------------+
^ |
| | 语音识别完成
| +------------+ v
+--------- | SPEAKING | <-----------------+
完成播放 +------------+
├── .github # GitHub 相关配置
│ └── ISSUE_TEMPLATE # Issue 模板目录
│ ├── bug_report.md # Bug 报告模板
│ ├── code_improvement.md # 代码改进建议模板
│ ├── documentation_improvement.md # 文档改进建议模板
│ └── feature_request.md # 功能请求模板
├── config # 配置文件目录
│ └── config.json # 应用程序配置文件
├── docs # 文档目录
│ ├── images # 文档图片资源
│ │ ├── QQ音乐接口配置.png # QQ音乐接口配置示例图
│ │ ├── 唤醒词.png # 唤醒词设置示例图
│ │ └── 群聊.jpg # 社区交流群图片
│ ├── 使用文档.md # 用户使用指南
│ └── 异常汇总.md # 常见错误及解决方案
├── libs # 依赖库目录
│ └── windows # Windows 平台特定库
│ └── opus.dll # Opus 音频编解码库
├── scripts # 实用脚本目录
│ ├── dir_tree.py # 生成目录树结构脚本
│ └── py_audio_scanner.py # 音频设备扫描工具
├── src # 源代码目录
│ ├── audio_codecs # 音频编解码模块
│ │ └── audio_codec.py # 音频编解码器实现
│ ├── audio_processing # 音频处理模块
│ │ ├── vad_detector.py # 语音活动检测实现(用于实时打断)
│ │ └── wake_word_detect.py # 语音唤醒词检测实现
│ ├── constants # 常量定义
│ │ └── constants.py # 应用程序常量(状态、事件类型等)
│ ├── display # 显示界面模块
│ │ ├── base_display.py # 显示界面基类
│ │ ├── cli_display.py # 命令行界面实现
│ │ └── gui_display.py # 图形用户界面实现
│ ├── iot # IoT设备相关模块
│ │ ├── things # 具体设备实现目录
│ │ │ ├── CameraVL # 摄像头与视觉识别模块
│ │ │ │ ├── Camera.py # 摄像头控制实现
│ │ │ │ └── VL.py # 视觉识别实现
│ │ │ ├── lamp.py # 智能灯具控制实现
│ │ │ ├── music_player.py # 音乐播放器实现
│ │ │ └── speaker.py # 智能音箱控制实现
│ │ ├── thing.py # IoT设备基类定义
│ │ └── thing_manager.py # IoT设备管理器(统一管理各类设备)
│ ├── protocols # 通信协议模块
│ │ ├── mqtt_protocol.py # MQTT 协议实现(用于设备通信)
│ │ ├── protocol.py # 协议基类
│ │ └── websocket_protocol.py # WebSocket 协议实现
│ ├── utils # 工具类模块
│ │ ├── config_manager.py # 配置管理器(单例模式)
│ │ ├── logging_config.py # 日志配置
│ │ ├── system_info.py # 系统信息工具(处理 opus.dll 加载等)
│ │ └── volume_controller.py # 音量控制工具(跨平台音量调节)
│ └── application.py # 应用程序主类(核心业务逻辑)
├── .gitignore # Git 忽略文件配置
├── LICENSE # 项目许可证
├── README.md # 项目说明文档
├── main.py # 程序入口点
├── requirements.txt # Python 依赖包列表(通用)
├── requirements_mac.txt # macOS 特定依赖包列表
└── xiaozhi.spec # PyInstaller 打包配置文件
- [x] 新增 GUI 页面,无需在控制台一直按空格
- [x] 代码模块化,拆分代码并封装为类,职责分明
- [x] 音量调节,可手动调整音量大小
- [x] 自动获取 MAC 地址,避免 MAC 地址冲突
- [x] 支持 WSS 协议,提升安全性和兼容性
- [x] GUI 新增小智表情与文本显示,增强交互体验
- [x] 新增命令行操控方案,适用于 Linux 嵌入式设备
- [x] 自动对话模式,实现更自然的交互
- [x] 语音唤醒,支持唤醒词激活交互 (默认关闭需要手动开启)
- [x] IoT 设备集成,实现更多物联网功能
- [x] 联网音乐播放
- [x] 新增 Volume控制类统一声音改变
- [x] 新增 视觉多模态
- [x] 修复 goodbye 后无法重连 的问题
- [x] 解决 macOS 和 Linux 运行异常(原先使用 pycaw 处理音量导致)
- [x] 优化“按住说话”按钮,使其更明显
- [x] 修复 Stream not open 错误(目前 Windows 不再触发,其他系统待确认)
- [x] 修复 没有找到该设备的版本信息,请正确配置 OTA 地址提示
- [x] 修复 cli模式update_volume缺失问题
- [ ] 新 GUI(Electron),提供更现代的用户界面
欢迎提交 Issues 和 Pull Requests!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for py-xiaozhi
Similar Open Source Tools

py-xiaozhi
py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.

py-xiaozhi
py-xiaozhi is a Python-based XiaoZhi voice client designed for learning through code and experiencing AI XiaoZhi's voice functions without hardware conditions. The repository is based on the xiaozhi-esp32 port. It supports AI voice interaction, visual multimodal capabilities, IoT device integration, online music playback, voice wake-up, automatic conversation mode, graphical user interface, command-line mode, cross-platform support, volume control, session management, encrypted audio transmission, automatic captcha handling, automatic MAC address retrieval, code modularization, and stability optimization.

Snap-Solver
Snap-Solver is a revolutionary AI tool for online exam solving, designed for students, test-takers, and self-learners. With just a keystroke, it automatically captures any question on the screen, analyzes it using AI, and provides detailed answers. Whether it's complex math formulas, physics problems, coding issues, or challenges from other disciplines, Snap-Solver offers clear, accurate, and structured solutions to help you better understand and master the subject matter.

MaiMBot
MaiMBot is an intelligent QQ group chat bot based on a large language model. It is developed using the nonebot2 framework, utilizes LLM for conversation abilities, MongoDB for data persistence, and NapCat for QQ protocol support. The bot features keyword-triggered proactive responses, dynamic prompt construction, support for images and message forwarding, typo generation, multiple replies, emotion-based emoji responses, daily schedule generation, user relationship management, knowledge base, and group impressions. Work-in-progress features include personality, group atmosphere, image handling, humor, meme functions, and Minecraft interactions. The tool is in active development with plans for GIF compatibility, mini-program link parsing, bug fixes, documentation improvements, and logic enhancements for emoji sending.

godoos
GodoOS is an efficient intranet office operating system that includes various office tools such as word/excel/ppt/pdf/internal chat/whiteboard/mind map, with native file storage support. The platform interface mimics the Windows style, making it easy to operate while maintaining low resource consumption and high performance. It automatically connects to intranet users without registration, enabling instant communication and file sharing. The flexible and highly configurable app store allows for unlimited expansion.

AHU-AI-Repository
This repository is dedicated to the learning and exchange of resources for the School of Artificial Intelligence at Anhui University. Notes will be published on this website first: https://www.aoaoaoao.cn and will be synchronized to the repository regularly. You can also contact me at [email protected].

LLMAI-writer
LLMAI-writer is a powerful AI tool for assisting in novel writing, utilizing state-of-the-art large language models to help writers brainstorm, plan, and create novels. Whether you are an experienced writer or a beginner, LLMAI-writer can help you efficiently complete the writing process.

all-in-rag
All-in-RAG is a comprehensive repository for all things related to Randomized Algorithms and Graphs. It provides a wide range of resources, including implementations of various randomized algorithms, graph data structures, and visualization tools. The repository aims to serve as a one-stop solution for researchers, students, and enthusiasts interested in exploring the intersection of randomized algorithms and graph theory. Whether you are looking to study theoretical concepts, implement algorithms in practice, or visualize graph structures, All-in-RAG has got you covered.

DocTranslator
DocTranslator is a document translation tool that supports various file formats, compatible with OpenAI format API, and offers batch operations and multi-threading support. Whether for individual users or enterprise teams, DocTranslator helps efficiently complete document translation tasks. It supports formats like txt, markdown, word, csv, excel, pdf (non-scanned), and ppt for AI translation. The tool is deployed using Docker for easy setup and usage.

vpnfast.github.io
VPNFast is a lightweight and fast VPN service provider that offers secure and private internet access. With VPNFast, users can protect their online privacy, bypass geo-restrictions, and secure their internet connection from hackers and snoopers. The service provides high-speed servers in multiple locations worldwide, ensuring a reliable and seamless VPN experience for users. VPNFast is easy to use, with a user-friendly interface and simple setup process. Whether you're browsing the web, streaming content, or accessing sensitive information, VPNFast helps you stay safe and anonymous online.

xiaoniu
Xiaoniu AI Video Translation is a video AI translation tool that can translate speech or subtitles in videos into multiple languages such as Chinese, English, Japanese, French, and Korean. It enables easy creation of multilingual versions and enhances global dissemination. It utilizes AI technology to generate new translated videos, automatically retaining background sound effects and replacing them with new translated voices, achieving precise synchronization of sound and mouth movements. Whether for creating short films or promoting videos on platforms like Douyin, TikTok, and YouTube, Xiaoniu AI Video Translation helps users easily overcome language barriers and broaden the reach of videos globally.

gez
Gez is a high-performance micro frontend framework based on ESM. It uses Rspack compilation and maps modules to URLs with strong caching and content-based hashing. Gez embraces modern micro frontend architecture by leveraging ESM and importmap for dependency management, providing reliable isolation with module scope, seamless integration with any modern frontend framework, intuitive development experience, and optimal performance with zero runtime overhead and reliable caching strategies.

prism-insight
PRISM-INSIGHT is a comprehensive stock analysis and trading simulation system based on AI agents. It automatically captures daily surging stocks via Telegram channel, generates expert-level analyst reports, and performs trading simulations. The system utilizes OpenAI GPT-4.1 for in-depth stock analysis and GPT-5 for investment strategy simulation. It also interacts with users via Anthropic Claude for Telegram conversations. The system architecture includes AI analysis agents, stock tracking, PDF conversion, and Telegram bot functionalities. Users can customize criteria for identifying surging stocks, modify AI prompts, and adjust chart styles. The project is open-source under the MIT license, and all investment decisions based on the analysis are the responsibility of the user.

AI-Drug-Discovery-Design
AI-Drug-Discovery-Design is a repository focused on Artificial Intelligence-assisted Drug Discovery and Design. It explores the use of AI technology to accelerate and optimize the drug development process. The advantages of AI in drug design include speeding up research cycles, improving accuracy through data-driven models, reducing costs by minimizing experimental redundancies, and enabling personalized drug design for specific patients or disease characteristics.

MoneyPrinterPlus
MoneyPrinterPlus is a project designed to help users easily make money in the era of short videos. It leverages AI big model technology to batch generate various short videos, perform video editing, and automatically publish videos to popular platforms like Douyin, Kuaishou, Xiaohongshu, and Video Number. The tool covers a wide range of functionalities including integrating with major AI big model tools, supporting various voice types, offering video transition effects, enabling customization of subtitles, and more. It aims to simplify the process of creating and sharing videos to monetize traffic.
For similar tasks

py-xiaozhi
py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.

Ai-Hoshino
Ai Hoshino - MD is a WhatsApp bot tool with features like voice and text interaction, group configuration, anti-delete, anti-link, personalized welcome messages, chatbot functionality, sticker creation, sub-bot integration, RPG game, YouTube music and video downloads, and more. The tool is actively maintained by Starlights Team and offers a range of functionalities for WhatsApp users.

aiohttp-session
aiohttp_session is a Python library that provides session management for aiohttp.web applications. It allows storing user-specific data in session objects with a dict-like interface. The library offers different session storage options, including SimpleCookieStorage for testing, EncryptedCookieStorage for secure data storage, and RedisStorage for storing data in Redis. Users can easily integrate session management into their aiohttp.web applications by registering the session middleware. The library is designed to simplify session handling and enhance the security of web applications.

chatgpt-wechat
ChatGPT-WeChat is a personal assistant application that can be safely used on WeChat through enterprise WeChat without the risk of being banned. The project is open source and free, with no paid sections or external traffic operations except for advertising on the author's public account '积木成楼'. It supports various features such as secure usage on WeChat, multi-channel customer service message integration, proxy support, session management, rapid message response, voice and image messaging, drawing capabilities, private data storage, plugin support, and more. Users can also develop their own capabilities following the rules provided. The project is currently in development with stable versions available for use.

crush
Crush is a versatile tool designed to enhance coding workflows in your terminal. It offers support for multiple LLMs, allows for flexible switching between models, and enables session-based work management. Crush is extensible through MCPs and works across various operating systems. It can be installed using package managers like Homebrew and NPM, or downloaded directly. Crush supports various APIs like Anthropic, OpenAI, Groq, and Google Gemini, and allows for customization through environment variables. The tool can be configured locally or globally, and supports LSPs for additional context. Crush also provides options for ignoring files, allowing tools, and configuring local models. It respects `.gitignore` files and offers logging capabilities for troubleshooting and debugging.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.