py-xiaozhi
python版本的小智ai,主要帮助那些没有硬件却想体验小智功能的人
Stars: 554
py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.
README:
py-xiaozhi 是一个使用 Python 实现的小智语音客户端,旨在通过代码学习和在没有硬件条件下体验 AI 小智的语音功能。 本仓库是基于xiaozhi-esp32移植
- 仔细阅读/docs/使用文档.md 启动教程和文件说明都在里面了
- main是最新代码,每次更新都需要手动重新安装一次pip依赖防止我新增依赖后你们本地没有
- 目前在树莓派b4、b5、jetson等设备运行有问题
- mac的话 m1、m2等会出现找不到opuslib,目前有人提示说是python版本问题
- 跪求大佬加入!!!
- Python 3.9.13+(推荐 3.9.13)最大支持版本3.12
- Windows/Linux/macOS
- main 主分支
- feature/v1 第一个版本
- feature/visual 视觉分支
- 语音交互:支持语音输入与识别,实现智能人机交互。
- 图形化界面:提供直观易用的 GUI,方便用户操作。
- 音量控制:支持音量调节,适应不同环境需求。
- 会话管理:有效管理多轮对话,保持交互的连续性。
- 加密音频传输:保障音频数据的安全性,防止信息泄露。
- CLI 模式:支持命令行运行,适用于嵌入式设备或无 GUI 环境。
- 自动验证码处理:首次使用时,程序自动复制验证码并打开浏览器,简化用户操作。
- 唤醒词:支持语音唤醒,免去手动操作的烦恼。
- 键盘按键:监听可以最小化视口
+----------------+
| |
v |
+------+ 唤醒词/按钮 +------------+ | +------------+
| IDLE | -----------> | CONNECTING | --+-> | LISTENING |
+------+ +------------+ +------------+
^ |
| | 语音识别完成
| +------------+ v
+--------- | SPEAKING | <-----------------+
完成播放 +------------+
├── .github # GitHub 相关配置
│ └── ISSUE_TEMPLATE # Issue 模板目录
│ ├── bug_report.md # Bug 报告模板
│ ├── code_improvement.md # 代码改进建议模板
│ ├── documentation_improvement.md # 文档改进建议模板
│ └── feature_request.md # 功能请求模板
├── config # 配置文件目录
│ └── config.json # 应用程序配置文件
├── docs # 文档目录
│ ├── images # 文档图片资源
│ │ ├── QQ音乐接口配置.png # QQ音乐接口配置示例图
│ │ ├── 唤醒词.png # 唤醒词设置示例图
│ │ └── 群聊.jpg # 社区交流群图片
│ ├── 使用文档.md # 用户使用指南
│ └── 异常汇总.md # 常见错误及解决方案
├── libs # 依赖库目录
│ └── windows # Windows 平台特定库
│ └── opus.dll # Opus 音频编解码库
├── scripts # 实用脚本目录
│ ├── dir_tree.py # 生成目录树结构脚本
│ └── py_audio_scanner.py # 音频设备扫描工具
├── src # 源代码目录
│ ├── audio_codecs # 音频编解码模块
│ │ └── audio_codec.py # 音频编解码器实现
│ ├── audio_processing # 音频处理模块
│ │ ├── vad_detector.py # 语音活动检测实现(用于实时打断)
│ │ └── wake_word_detect.py # 语音唤醒词检测实现
│ ├── constants # 常量定义
│ │ └── constants.py # 应用程序常量(状态、事件类型等)
│ ├── display # 显示界面模块
│ │ ├── base_display.py # 显示界面基类
│ │ ├── cli_display.py # 命令行界面实现
│ │ └── gui_display.py # 图形用户界面实现
│ ├── iot # IoT设备相关模块
│ │ ├── things # 具体设备实现目录
│ │ │ ├── CameraVL # 摄像头与视觉识别模块
│ │ │ │ ├── Camera.py # 摄像头控制实现
│ │ │ │ └── VL.py # 视觉识别实现
│ │ │ ├── lamp.py # 智能灯具控制实现
│ │ │ ├── music_player.py # 音乐播放器实现
│ │ │ └── speaker.py # 智能音箱控制实现
│ │ ├── thing.py # IoT设备基类定义
│ │ └── thing_manager.py # IoT设备管理器(统一管理各类设备)
│ ├── protocols # 通信协议模块
│ │ ├── mqtt_protocol.py # MQTT 协议实现(用于设备通信)
│ │ ├── protocol.py # 协议基类
│ │ └── websocket_protocol.py # WebSocket 协议实现
│ ├── utils # 工具类模块
│ │ ├── config_manager.py # 配置管理器(单例模式)
│ │ ├── logging_config.py # 日志配置
│ │ ├── system_info.py # 系统信息工具(处理 opus.dll 加载等)
│ │ └── volume_controller.py # 音量控制工具(跨平台音量调节)
│ └── application.py # 应用程序主类(核心业务逻辑)
├── .gitignore # Git 忽略文件配置
├── LICENSE # 项目许可证
├── README.md # 项目说明文档
├── main.py # 程序入口点
├── requirements.txt # Python 依赖包列表(通用)
├── requirements_mac.txt # macOS 特定依赖包列表
└── xiaozhi.spec # PyInstaller 打包配置文件
- [x] 新增 GUI 页面,无需在控制台一直按空格
- [x] 代码模块化,拆分代码并封装为类,职责分明
- [x] 音量调节,可手动调整音量大小
- [x] 自动获取 MAC 地址,避免 MAC 地址冲突
- [x] 支持 WSS 协议,提升安全性和兼容性
- [x] GUI 新增小智表情与文本显示,增强交互体验
- [x] 新增命令行操控方案,适用于 Linux 嵌入式设备
- [x] 自动对话模式,实现更自然的交互
- [x] 语音唤醒,支持唤醒词激活交互 (默认关闭需要手动开启)
- [x] IoT 设备集成,实现更多物联网功能
- [x] 联网音乐播放
- [x] 新增 Volume控制类统一声音改变
- [x] 新增 视觉多模态
- [x] 修复 goodbye 后无法重连 的问题
- [x] 解决 macOS 和 Linux 运行异常(原先使用 pycaw 处理音量导致)
- [x] 优化“按住说话”按钮,使其更明显
- [x] 修复 Stream not open 错误(目前 Windows 不再触发,其他系统待确认)
- [x] 修复 没有找到该设备的版本信息,请正确配置 OTA 地址提示
- [x] 修复 cli模式update_volume缺失问题
- [ ] 新 GUI(Electron),提供更现代的用户界面
欢迎提交 Issues 和 Pull Requests!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for py-xiaozhi
Similar Open Source Tools
py-xiaozhi
py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.
JeecgBoot
JeecgBoot is a Java AI Low Code Platform for Enterprise web applications, based on BPM and code generator. It features a SpringBoot2.x/3.x backend, SpringCloud, Ant Design Vue3, Mybatis-plus, Shiro, JWT, supporting microservices, multi-tenancy, and AI capabilities like DeepSeek and ChatGPT. The powerful code generator allows for one-click generation of frontend and backend code without writing any code. JeecgBoot leads the way in AI low-code development mode, helping to solve 80% of repetitive work in Java projects and allowing developers to focus more on business logic.
private-llm-qa-bot
This is a production-grade knowledge Q&A chatbot implementation based on AWS services and the LangChain framework, with optimizations at various stages. It supports flexible configuration and plugging of vector models and large language models. The front and back ends are separated, making it easy to integrate with IM tools (such as Feishu).
AI-CloudOps
AI+CloudOps is a cloud-native operations management platform designed for enterprises. It aims to integrate artificial intelligence technology with cloud-native practices to significantly improve the efficiency and level of operations work. The platform offers features such as AIOps for monitoring data analysis and alerts, multi-dimensional permission management, visual CMDB for resource management, efficient ticketing system, deep integration with Prometheus for real-time monitoring, and unified Kubernetes management for cluster optimization.
PaiAgent
PaiAgent is an enterprise-level AI workflow visualization orchestration platform that simplifies the combination and scheduling of AI capabilities. It allows developers and business users to quickly build complex AI processing flows through an intuitive drag-and-drop interface, without the need to write code, enabling collaboration of various large models.
gin-vue-admin
Gin-vue-admin is a full-stack development platform based on Vue and Gin, integrating features like JWT authentication, dynamic routing, dynamic menus, Casbin authorization, form generator, code generator, etc. It provides various example files to help users focus more on business development. The project offers detailed documentation, video tutorials for setup and deployment, and a community for support and contributions. Users need a certain level of knowledge in Golang and Vue to work with this project. It is recommended to follow the Apache2.0 license if using the project for commercial purposes.
z.ai2api_python
Z.AI2API Python is a lightweight OpenAI API proxy service that integrates seamlessly with existing applications. It supports the full functionality of GLM-4.5 series models and features high-performance streaming responses, enhanced tool invocation, support for thinking mode, integration with search models, Docker deployment, session isolation for privacy protection, flexible configuration via environment variables, and intelligent upstream model routing.
py-xiaozhi
py-xiaozhi is a Python-based XiaoZhi voice client designed for learning through code and experiencing AI XiaoZhi's voice functions without hardware conditions. The repository is based on the xiaozhi-esp32 port. It supports AI voice interaction, visual multimodal capabilities, IoT device integration, online music playback, voice wake-up, automatic conversation mode, graphical user interface, command-line mode, cross-platform support, volume control, session management, encrypted audio transmission, automatic captcha handling, automatic MAC address retrieval, code modularization, and stability optimization.
adnify
Adnify is an advanced code editor with ultimate visual experience and deep integration of AI Agent. It goes beyond traditional IDEs, featuring Cyberpunk glass morphism design style and a powerful AI Agent supporting full automation from code generation to file operations.
AgentX
AgentX is a next-generation open-source AI agent development framework and runtime platform. It provides an event-driven runtime with a simple framework and minimal UI. The platform is ready-to-use and offers features like multi-user support, session persistence, real-time streaming, and Docker readiness. Users can build AI Agent applications with event-driven architecture using TypeScript for server-side (Node.js) and client-side (Browser/React) development. AgentX also includes comprehensive documentation, core concepts, guides, API references, and various packages for different functionalities. The architecture follows an event-driven design with layered components for server-side and client-side interactions.
boxlite
BoxLite is an embedded, lightweight micro-VM runtime designed for AI agents running OCI containers with hardware-level isolation. It is built for high concurrency with no daemon required, offering features like lightweight VMs, high concurrency, hardware isolation, embeddability, and OCI compatibility. Users can spin up 'Boxes' to run containers for AI agent sandboxes and multi-tenant code execution scenarios where Docker alone is insufficient and full VM infrastructure is too heavy. BoxLite supports Python, Node.js, and Rust with quick start guides for each, along with features like CPU/memory limits, storage options, networking capabilities, security layers, and image registry configuration. The tool provides SDKs for Python and Node.js, with Go support coming soon. It offers detailed documentation, examples, and architecture insights for users to understand how BoxLite works under the hood.
vibium
Vibium is a browser automation infrastructure designed for AI agents, providing a single binary that manages browser lifecycle, WebDriver BiDi protocol, and an MCP server. It offers zero configuration, AI-native capabilities, and is lightweight with no runtime dependencies. It is suitable for AI agents, test automation, and any tasks requiring browser interaction.
topsha
LocalTopSH is an AI Agent Framework designed for companies and developers who require 100% on-premise AI agents with data privacy. It supports various OpenAI-compatible LLM backends and offers production-ready security features. The framework allows simple deployment using Docker compose and ensures that data stays within the user's network, providing full control and compliance. With cost-effective scaling options and compatibility in regions with restrictions, LocalTopSH is a versatile solution for deploying AI agents on self-hosted infrastructure.
resume-design
Resume-design is an open-source and free resume design and template download website, built with Vue3 + TypeScript + Vite + Element-plus + pinia. It provides two design tools for creating beautiful resumes and a complete backend management system. The project has released two frontend versions and will integrate with a backend system in the future. Users can learn frontend by downloading the released versions or learn design tools by pulling the latest frontend code.
Whimbox
Whimbox is a game AI agent based on large language models and image recognition technology, providing users with a new gaming experience. It automates daily tasks such as mining, material collection, and wish checking, as well as features like route recording, image recognition, and AI dialogue. The tool does not modify game files or memory, only captures screenshots and simulates mouse and keyboard actions. It is designed for games running in a 1920x1080 windowed mode on mid to high-end PCs, with plans for future cloud gaming support. Whimbox is grateful to open-source projects like GIA and BetterGI, as well as AI models and programming tools like chatgpt and cursor. Developers interested in contributing to the project can join the development community and explore various functionalities that need development and adaptation.
For similar tasks
py-xiaozhi
py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.
HMusic
HMusic is an intelligent music player that supports Xiaomi AI speakers. It offers dual modes: xiaomusic server mode and Xiaomi IoT direct connection mode. Users can choose between these modes based on their preferences and needs. The direct connection mode is suitable for regular users who want a plug-and-play experience, while the xiaomusic mode is ideal for users with NAS/servers who require advanced features like a local music library, playlists, and progress control. The tool provides functionalities such as music search, playback, and volume control, catering to a wide range of music enthusiasts.
Ai-Hoshino
Ai Hoshino - MD is a WhatsApp bot tool with features like voice and text interaction, group configuration, anti-delete, anti-link, personalized welcome messages, chatbot functionality, sticker creation, sub-bot integration, RPG game, YouTube music and video downloads, and more. The tool is actively maintained by Starlights Team and offers a range of functionalities for WhatsApp users.
aiohttp-session
aiohttp_session is a Python library that provides session management for aiohttp.web applications. It allows storing user-specific data in session objects with a dict-like interface. The library offers different session storage options, including SimpleCookieStorage for testing, EncryptedCookieStorage for secure data storage, and RedisStorage for storing data in Redis. Users can easily integrate session management into their aiohttp.web applications by registering the session middleware. The library is designed to simplify session handling and enhance the security of web applications.
chatgpt-wechat
ChatGPT-WeChat is a personal assistant application that can be safely used on WeChat through enterprise WeChat without the risk of being banned. The project is open source and free, with no paid sections or external traffic operations except for advertising on the author's public account '积木成楼'. It supports various features such as secure usage on WeChat, multi-channel customer service message integration, proxy support, session management, rapid message response, voice and image messaging, drawing capabilities, private data storage, plugin support, and more. Users can also develop their own capabilities following the rules provided. The project is currently in development with stable versions available for use.
crush
Crush is a versatile tool designed to enhance coding workflows in your terminal. It offers support for multiple LLMs, allows for flexible switching between models, and enables session-based work management. Crush is extensible through MCPs and works across various operating systems. It can be installed using package managers like Homebrew and NPM, or downloaded directly. Crush supports various APIs like Anthropic, OpenAI, Groq, and Google Gemini, and allows for customization through environment variables. The tool can be configured locally or globally, and supports LSPs for additional context. Crush also provides options for ignoring files, allowing tools, and configuring local models. It respects `.gitignore` files and offers logging capabilities for troubleshooting and debugging.
SWE-ReX
SWE-ReX is a runtime interface for interacting with sandboxed shell environments, allowing AI agents to run any command on any environment. It enables agents to interact with running shell sessions, use interactive command line tools, and manage multiple shell sessions in parallel. SWE-ReX simplifies agent development and evaluation by abstracting infrastructure concerns, supporting fast parallel runs on various platforms, and disentangling agent logic from infrastructure.
sandbox-agent
Sandbox Agent is a server that runs inside sandboxes to control coding agents remotely over HTTP. It provides a universal API to interact with coding agents like Claude Code, Codex, OpenCode, and Amp, allowing users to manage sessions, handle permissions, and stream events. The tool solves the challenges of executing coding agents remotely, standardizes APIs for different agents, and persists session data for later replay and auditing. It offers features like streaming events, human-in-the-loop interactions, automatic agent installation, and runs inside any sandbox environment.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.