py-xiaozhi

python版本的小智ai，主要帮助那些没有硬件却想体验小智功能的人

Stars: 554

Visit

py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.

README:

py-xiaozhi

项目简介

py-xiaozhi 是一个使用 Python 实现的小智语音客户端，旨在通过代码学习和在没有硬件条件下体验 AI 小智的语音功能。本仓库是基于xiaozhi-esp32移植

请先看这里！

仔细阅读/docs/使用文档.md 启动教程和文件说明都在里面了
main是最新代码，每次更新都需要手动重新安装一次pip依赖防止我新增依赖后你们本地没有

有没有大佬可以加入的

目前在树莓派b4、b5、jetson等设备运行有问题
mac的话 m1、m2等会出现找不到opuslib，目前有人提示说是python版本问题
跪求大佬加入！！！

从零开始使用小智客户端（视频教程）

环境要求

Python 3.9.13+（推荐 3.9.13）最大支持版本3.12
Windows/Linux/macOS

演示

Bilibili 演示视频

功能特点

语音交互：支持语音输入与识别，实现智能人机交互。
图形化界面：提供直观易用的 GUI，方便用户操作。
音量控制：支持音量调节，适应不同环境需求。
会话管理：有效管理多轮对话，保持交互的连续性。
加密音频传输：保障音频数据的安全性，防止信息泄露。
CLI 模式：支持命令行运行，适用于嵌入式设备或无 GUI 环境。
自动验证码处理：首次使用时，程序自动复制验证码并打开浏览器，简化用户操作。
唤醒词：支持语音唤醒，免去手动操作的烦恼。
键盘按键：监听可以最小化视口

状态流转图

                        +----------------+
                        |                |
                        v                |
+------+  唤醒词/按钮  +------------+   |   +------------+
| IDLE | -----------> | CONNECTING | --+-> | LISTENING  |
+------+              +------------+       +------------+
   ^                                            |
   |                                            | 语音识别完成
   |          +------------+                    v
   +--------- |  SPEAKING  | <-----------------+
     完成播放 +------------+

项目结构

├── .github                          # GitHub 相关配置
│   └── ISSUE_TEMPLATE               # Issue 模板目录
│       ├── bug_report.md            # Bug 报告模板
│       ├── code_improvement.md      # 代码改进建议模板
│       ├── documentation_improvement.md  # 文档改进建议模板
│       └── feature_request.md       # 功能请求模板
├── config                           # 配置文件目录
│   └── config.json                  # 应用程序配置文件
├── docs                             # 文档目录
│   ├── images                       # 文档图片资源
│   │   ├── QQ音乐接口配置.png       # QQ音乐接口配置示例图
│   │   ├── 唤醒词.png               # 唤醒词设置示例图
│   │   └── 群聊.jpg                 # 社区交流群图片
│   ├── 使用文档.md                  # 用户使用指南
│   └── 异常汇总.md                  # 常见错误及解决方案
├── libs                             # 依赖库目录
│   └── windows                      # Windows 平台特定库
│       └── opus.dll                 # Opus 音频编解码库
├── scripts                          # 实用脚本目录
│   ├── dir_tree.py                  # 生成目录树结构脚本
│   └── py_audio_scanner.py          # 音频设备扫描工具
├── src                              # 源代码目录
│   ├── audio_codecs                 # 音频编解码模块
│   │   └── audio_codec.py           # 音频编解码器实现
│   ├── audio_processing             # 音频处理模块
│   │   ├── vad_detector.py          # 语音活动检测实现（用于实时打断）
│   │   └── wake_word_detect.py      # 语音唤醒词检测实现
│   ├── constants                    # 常量定义
│   │   └── constants.py             # 应用程序常量（状态、事件类型等）
│   ├── display                      # 显示界面模块
│   │   ├── base_display.py          # 显示界面基类
│   │   ├── cli_display.py           # 命令行界面实现
│   │   └── gui_display.py           # 图形用户界面实现
│   ├── iot                          # IoT设备相关模块
│   │   ├── things                   # 具体设备实现目录
│   │   │   ├── CameraVL             # 摄像头与视觉识别模块
│   │   │   │   ├── Camera.py        # 摄像头控制实现
│   │   │   │   └── VL.py            # 视觉识别实现
│   │   │   ├── lamp.py              # 智能灯具控制实现
│   │   │   ├── music_player.py      # 音乐播放器实现
│   │   │   └── speaker.py           # 智能音箱控制实现
│   │   ├── thing.py                 # IoT设备基类定义
│   │   └── thing_manager.py         # IoT设备管理器（统一管理各类设备）
│   ├── protocols                    # 通信协议模块
│   │   ├── mqtt_protocol.py         # MQTT 协议实现（用于设备通信）
│   │   ├── protocol.py              # 协议基类
│   │   └── websocket_protocol.py    # WebSocket 协议实现
│   ├── utils                        # 工具类模块
│   │   ├── config_manager.py        # 配置管理器（单例模式）
│   │   ├── logging_config.py        # 日志配置
│   │   ├── system_info.py           # 系统信息工具（处理 opus.dll 加载等）
│   │   └── volume_controller.py     # 音量控制工具（跨平台音量调节）
│   └── application.py               # 应用程序主类（核心业务逻辑）
├── .gitignore                       # Git 忽略文件配置
├── LICENSE                          # 项目许可证
├── README.md                        # 项目说明文档
├── main.py                          # 程序入口点
├── requirements.txt                 # Python 依赖包列表（通用）
├── requirements_mac.txt             # macOS 特定依赖包列表
└── xiaozhi.spec                     # PyInstaller 打包配置文件

已实现功能

[x] 新增 GUI 页面，无需在控制台一直按空格
[x] 代码模块化，拆分代码并封装为类，职责分明
[x] 音量调节，可手动调整音量大小
[x] 自动获取 MAC 地址，避免 MAC 地址冲突
[x] 支持 WSS 协议，提升安全性和兼容性
[x] GUI 新增小智表情与文本显示，增强交互体验
[x] 新增命令行操控方案，适用于 Linux 嵌入式设备
[x] 自动对话模式，实现更自然的交互
[x] 语音唤醒，支持唤醒词激活交互 (默认关闭需要手动开启)
[x] IoT 设备集成，实现更多物联网功能
[x] 联网音乐播放
[x] 新增 Volume控制类统一声音改变
[x] 新增视觉多模态

优化

[x] 修复 goodbye 后无法重连 的问题
[x] 解决 macOS 和 Linux 运行异常（原先使用 pycaw 处理音量导致）
[x] 优化“按住说话”按钮，使其更明显
[x] 修复 Stream not open 错误（目前 Windows 不再触发，其他系统待确认）
[x] 修复 没有找到该设备的版本信息，请正确配置 OTA 地址提示
[x] 修复 cli模式update_volume缺失问题

待实现功能

[ ] 新 GUI（Electron），提供更现代的用户界面

贡献

欢迎提交 Issues 和 Pull Requests！

感谢以下开源人员-排名不分前后

Star History

For Tasks:

Click tags to check more tools for each tasks

interact with voice control volume manage sessions transmit encrypted audio automate dialogue

For Jobs:

python developer ai engineer voice technology specialist software engineer machine learning engineer

Alternative AI tools for py-xiaozhi

Similar Open Source Tools

py-xiaozhi

github

: 554

py-xiaozhi

github

: 606

MaiMBot

MaiMBot is an intelligent QQ group chat bot based on a large language model. It is developed using the nonebot2 framework, utilizes LLM for conversation abilities, MongoDB for data persistence, and NapCat for QQ protocol support. The bot features keyword-triggered proactive responses, dynamic prompt construction, support for images and message forwarding, typo generation, multiple replies, emotion-based emoji responses, daily schedule generation, user relationship management, knowledge base, and group impressions. Work-in-progress features include personality, group atmosphere, image handling, humor, meme functions, and Minecraft interactions. The tool is in active development with plans for GIF compatibility, mini-program link parsing, bug fixes, documentation improvements, and logic enhancements for emoji sending.

github

: 1.1k

Snap-Solver

github

: 74

godoos

GodoOS is an efficient intranet office operating system that includes various office tools such as word/excel/ppt/pdf/internal chat/whiteboard/mind map, with native file storage support. The platform interface mimics the Windows style, making it easy to operate while maintaining low resource consumption and high performance. It automatically connects to intranet users without registration, enabling instant communication and file sharing. The flexible and highly configurable app store allows for unlimited expansion.

github

: 151

AHU-AI-Repository

This repository is dedicated to the learning and exchange of resources for the School of Artificial Intelligence at Anhui University. Notes will be published on this website first: https://www.aoaoaoao.cn and will be synchronized to the repository regularly. You can also contact me at [email protected].

github

: 197

LLMAI-writer

github

: 65

DocTranslator

github

: 60

AI-Sphere-Butler

github

: 68

vpnfast.github.io

VPNFast is a lightweight and fast VPN service provider that offers secure and private internet access. With VPNFast, users can protect their online privacy, bypass geo-restrictions, and secure their internet connection from hackers and snoopers. The service provides high-speed servers in multiple locations worldwide, ensuring a reliable and seamless VPN experience for users. VPNFast is easy to use, with a user-friendly interface and simple setup process. Whether you're browsing the web, streaming content, or accessing sensitive information, VPNFast helps you stay safe and anonymous online.

github

: 80

KouriChat

KouriChat is a project that seamlessly integrates virtual and real interactions, providing eternal gentle bonds. It offers features like WeChat integration, immersive role-playing, intelligent conversation segmentation, emotion-based emojis, image generation, image recognition, voice messages, and more. The project is focused on technical research and learning exchanges, with a strong emphasis on ethical and legal guidelines. Users are required to take full responsibility for their actions, especially minors who should use the tool under supervision. The project architecture includes avatar configurations, data storage, handlers, AI service interfaces, a web UI, and utility libraries.

github

: 1.8k

gez

Gez is a high-performance micro frontend framework based on ESM. It uses Rspack compilation and maps modules to URLs with strong caching and content-based hashing. Gez embraces modern micro frontend architecture by leveraging ESM and importmap for dependency management, providing reliable isolation with module scope, seamless integration with any modern frontend framework, intuitive development experience, and optimal performance with zero runtime overhead and reliable caching strategies.

github

: 584

AI-Drug-Discovery-Design

AI-Drug-Discovery-Design is a repository focused on Artificial Intelligence-assisted Drug Discovery and Design. It explores the use of AI technology to accelerate and optimize the drug development process. The advantages of AI in drug design include speeding up research cycles, improving accuracy through data-driven models, reducing costs by minimizing experimental redundancies, and enabling personalized drug design for specific patients or disease characteristics.

github

: 77

MoneyPrinterPlus

MoneyPrinterPlus is a project designed to help users easily make money in the era of short videos. It leverages AI big model technology to batch generate various short videos, perform video editing, and automatically publish videos to popular platforms like Douyin, Kuaishou, Xiaohongshu, and Video Number. The tool covers a wide range of functionalities including integrating with major AI big model tools, supporting various voice types, offering video transition effects, enabling customization of subtitles, and more. It aims to simplify the process of creating and sharing videos to monetize traffic.

github

: 1.8k

KubeDoor

KubeDoor is a microservice resource management platform developed using Python and Vue, based on K8S admission control mechanism. It supports unified remote storage, monitoring, alerting, notification, and display for multiple K8S clusters. The platform focuses on resource analysis and control during daily peak hours of microservices, ensuring consistency between resource request rate and actual usage rate.

github

: 272

kcores-llm-arena

github

: 344

For similar tasks

py-xiaozhi

github

: 554

Ai-Hoshino

Ai Hoshino - MD is a WhatsApp bot tool with features like voice and text interaction, group configuration, anti-delete, anti-link, personalized welcome messages, chatbot functionality, sticker creation, sub-bot integration, RPG game, YouTube music and video downloads, and more. The tool is actively maintained by Starlights Team and offers a range of functionalities for WhatsApp users.

github

: 67

aiohttp-session

aiohttp_session is a Python library that provides session management for aiohttp.web applications. It allows storing user-specific data in session objects with a dict-like interface. The library offers different session storage options, including SimpleCookieStorage for testing, EncryptedCookieStorage for secure data storage, and RedisStorage for storing data in Redis. Users can easily integrate session management into their aiohttp.web applications by registering the session middleware. The library is designed to simplify session handling and enhance the security of web applications.

github

: 237

chatgpt-wechat

ChatGPT-WeChat is a personal assistant application that can be safely used on WeChat through enterprise WeChat without the risk of being banned. The project is open source and free, with no paid sections or external traffic operations except for advertising on the author's public account '积木成楼'. It supports various features such as secure usage on WeChat, multi-channel customer service message integration, proxy support, session management, rapid message response, voice and image messaging, drawing capabilities, private data storage, plugin support, and more. Users can also develop their own capabilities following the rules provided. The project is currently in development with stable versions available for use.

github

: 996

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

py-xiaozhi

README:

py-xiaozhi

项目简介

请先看这里！

有没有大佬可以加入的

环境要求

相关分支

相关第三方开源项目

演示

功能特点

状态流转图

项目结构

已实现功能

优化

待实现功能

贡献

感谢以下开源人员-排名不分前后

Star History

For Tasks:

For Jobs:

Alternative AI tools for py-xiaozhi

Similar Open Source Tools

py-xiaozhi

py-xiaozhi

MaiMBot

Snap-Solver

godoos

AHU-AI-Repository

LLMAI-writer

DocTranslator

AI-Sphere-Butler

vpnfast.github.io

KouriChat

gez

AI-Drug-Discovery-Design

MoneyPrinterPlus

KubeDoor

kcores-llm-arena

For similar tasks

py-xiaozhi

Ai-Hoshino

aiohttp-session

chatgpt-wechat

For similar jobs

sweep

teams-ai

ai-guide

classifai

chatbot-ui

BricksLLM

uAgents

griptape