OmniSteward

🐼基于LLM Agent的全能管家，通过语音或文字交互，调用工具控制智能家居(HA/米家)和电脑。超高拓展性，无限可能。

Stars: 66

Visit

OmniSteward is an AI-powered steward system based on large language models that can interact with users through voice or text to help control smart home devices and computer programs. It supports multi-turn dialogue, tool calling for complex tasks, multiple LLM models, voice recognition, smart home control, computer program management, online information retrieval, command line operations, and file management. The system is highly extensible, allowing users to customize and share their own tools.

README:

中文文档

Note: This project is still under active development, and some features may be unstable, please use with caution

The English README is automatically generated, please refer to the Chinese version for the most accurate information

This is an AI-powered steward system based on large language models that can interact with users through voice or text to help control smart home devices and computer programs.

News

2024-12-18: Added support for HomeAssistant, can now control HomeAssistant/Mi Home devices, check omni-ha for more details

Highlights

Supports multi-turn dialogue for continuous user interaction
Supports tool calling to execute complex tasks on your computer
Supports multiple LLM models that can be switched as needed
Highly extensible - you can easily customize and share your own tools

Main Features

🎤 Voice recognition and interaction
🏠 Smart home control (HomeAssistant/Bemfa devices/Mi Home devices)
💻 Computer program management (start/stop programs)
🔍 Online information retrieval (via Stepfun Web Search or Kimi AI)
⌨️ Command line operations
📂 File management (file search/read/write/compress/list directory)

Demo Video

We prepared a series of demo videos, please watch demo videos to understand the main features and usage of the system.

System Requirements

Python 3.8+
Chrome browser (for Kimi AI functionality)
Windows OS (some features only support Windows, Linux and Mac untested)

Installation

Clone repository

git clone https://github.com/OmniSteward/OmniSteward.git
cd OmniSteward

Install dependencies

pip install -r requirements.txt

Environment Variables Configuration

See examples/env.cmd file

OPENAI_API_BASE=your_api_base # OpenAI format API base URL
OPENAI_API_KEY=your_api_key   # OpenAI format API key
SILICON_FLOW_API_KEY=your_api_key   # Silicon Flow API key for ASR, Rerank, see [LLM Platforms](docs/PLATFORM.md)
BEMFA_UID=your_bemfa_uid            # Bemfa platform UID (optional, for smart home control)
BEMFA_TOPIC=your_bemfa_topic        # Bemfa platform Topic (optional, for smart home control)
KIMI_PROFILE_PATH=path_to_chrome_profile    # Chrome user data directory (optional, for Kimi AI, uses default path if not set)
LOCATION=your_location                     # Your geographic location (optional, for system prompts)
LLM_MODEL=your_llm_model                   # LLM model to use, optional, defaults to Qwen2.5-7B-Instruct

For obtaining OpenAI format API key and base URL, see LLM Platforms

Reference links:

Launch

This project supports two usage modes:

Command Line Interface (CLI): Interact through command line, direct usage.
Web Mode: Requires frontend project, interact through WebUI, can be used remotely on phone, tablet, computer to manage smart home devices

Command Line Mode (CLI)

Please first configure environment variables in examples/env.cmd file (see Environment Variables Configuration)

Microphone Voice Input Mode

First start the VAD service:

python -m servers.vad_rpc

Then open a new command prompt window and run:

call examples\env.cmd # Apply environment variables
python -m core.cli --config configs/cli.py # Run CLI

See examples/cli_voice.cmd for more details

Text Input Mode

call examples\env.cmd # Apply environment variables
python -m core.cli --query "open NetEase Music" --config configs/cli.py

Adding Simple Custom Tools

call examples\env.cmd # Apply environment variables
python -m core.cli --query "print hello" --config configs/cli_custom_tool.py

This example adds a simple print tool in configs/cli_custom_tool.py that can print any string. Check this file to learn how to easily add custom tools

Web Mode

Requires frontend WebUI, called OmniSteward-Frontend
Environment variables must be configured, especially Silicon Flow API key
Frontend WebUI should run on http://localhost:3000, backend will forward requests to frontend when started
Backend service should run on http://localhost:8000

Start Backend Service

Please first configure environment variables in examples/env.cmd file (see Environment Variables Configuration), then run in project root:

call examples\env.cmd # Apply environment variables
python -m servers.steward --config configs/backend.py

Start Frontend Service

See OmniSteward-Frontend project.

Usage

Use Chrome/Edge browser, open http://localhost:8000 to start using.

Note: For external network access, since Chrome/Edge blocks microphone under HTTP by default, we need to set chrome://flags/#unsafely-treat-insecure-origin-as-secure to http://ip:port, otherwise it cannot be used. See tutorial for reference.

Mobile phones can also use Chrome or Edge browser, open http://ip:port to start using, requires same settings as above.

Available Tools and Customization

See TOOL_LIST.md

Notes

Some features require specific API keys and environment configuration
Command line tools require user confirmation before execution
Smart home control features require corresponding hardware support

Contributing

Currently this project is maintained by ElliottZheng, welcome to submit issues and pull requests!

Thanks

Thanks to Stepfun Stars Program for supporting this project.

License

MIT License

More Custom Tool Examples

See steward-utils project for more custom tool examples.

For Tasks:

Click tags to check more tools for each tasks

control devices manage programs retrieve information interact with users customize tools

For Jobs:

smart home technician ai assistant developer computer programmer voice interaction designer information retrieval specialist

Alternative AI tools for OmniSteward

Similar Open Source Tools

OmniSteward

github

: 66

SecureAI-Tools

SecureAI Tools is a private and secure AI tool that allows users to chat with AI models, chat with documents (PDFs), and run AI models locally. It comes with built-in authentication and user management, making it suitable for family members or coworkers. The tool is self-hosting optimized and provides necessary scripts and docker-compose files for easy setup in under 5 minutes. Users can customize the tool by editing the .env file and enabling GPU support for faster inference. SecureAI Tools also supports remote OpenAI-compatible APIs, with lower hardware requirements for using remote APIs only. The tool's features wishlist includes chat sharing, mobile-friendly UI, and support for more file types and markdown rendering.

github

: 1.4k

aiaio

aiaio (AI-AI-O) is a lightweight, privacy-focused web UI for interacting with AI models. It supports both local and remote LLM deployments through OpenAI-compatible APIs. The tool provides features such as dark/light mode support, local SQLite database for conversation storage, file upload and processing, configurable model parameters through UI, privacy-focused design, responsive design for mobile/desktop, syntax highlighting for code blocks, real-time conversation updates, automatic conversation summarization, customizable system prompts, WebSocket support for real-time updates, Docker support for deployment, multiple API endpoint support, and multiple system prompt support. Users can configure model parameters and API settings through the UI, handle file uploads, manage conversations, and use keyboard shortcuts for efficient interaction. The tool uses SQLite for storage with tables for conversations, messages, attachments, and settings. Contributions to the project are welcome under the Apache License 2.0.

github

: 282

minimal-chat

MinimalChat is a minimal and lightweight open-source chat application with full mobile PWA support that allows users to interact with various language models, including GPT-4 Omni, Claude Opus, and various Local/Custom Model Endpoints. It focuses on simplicity in setup and usage while being fully featured and highly responsive. The application supports features like fully voiced conversational interactions, multiple language models, markdown support, code syntax highlighting, DALL-E 3 integration, conversation importing/exporting, and responsive layout for mobile use.

github

: 171

ai-shifu

AI-Shifu is an AI-led chat flow tool powered by LLM that provides an interactive and immersive experience for users. It allows users to follow a preset chat flow while being able to ask questions and affect the conversation. The tool can make personalized outputs based on user identity, interests, and preferences, making users feel like they are receiving one-on-one service. It is suitable for education, storytelling, product guides, surveys, and game NPC scenarios.

github

: 183

openroleplay.ai

Open Roleplay is an open-source alternative to Character.ai. It allows users to create their own AI characters, customize them, and generate images and voices for them. Open Roleplay also supports group chat and automatic translation. The tool is built with Next.js, React.js, Tailwind CSS, Vercel, Convex, and Clerk.

github

: 146

Biomni

Biomni is a general-purpose biomedical AI agent designed to autonomously execute a wide range of research tasks across diverse biomedical subfields. By integrating cutting-edge large language model (LLM) reasoning with retrieval-augmented planning and code-based execution, Biomni helps scientists dramatically enhance research productivity and generate testable hypotheses.

github

: 2.1k

gemini-2-live-api-demo

A lightweight vanilla JavaScript implementation of the Gemini 2.0 Flash Multimodal Live API client, providing real-time interaction with Gemini's API through text, audio, video, and screen sharing capabilities. Built with vanilla JavaScript, it offers features like real-time text chat, audio input/output with visualization, motion-detected video streaming, and screen sharing. Users can connect to the API, send text messages, toggle microphone for audio input, enable webcam for video streaming, share screen, and monitor real-time feedback in the logs panel. Custom tools can be added for extending functionality.

github

: 186

E2B

E2B Sandbox is a secure sandboxed cloud environment made for AI agents and AI apps. Sandboxes allow AI agents and apps to have long running cloud secure environments. In these environments, large language models can use the same tools as humans do. For example: * Cloud browsers * GitHub repositories and CLIs * Coding tools like linters, autocomplete, "go-to defintion" * Running LLM generated code * Audio & video editing The E2B sandbox can be connected to any LLM and any AI agent or app.

github

: 9.6k

mattermost-plugin-ai

The Mattermost AI Copilot Plugin is an extension that adds functionality for local and third-party LLMs within Mattermost v9.6 and above. It is currently experimental and allows users to interact with AI models seamlessly. The plugin enhances the user experience by providing AI-powered assistance and features for communication and collaboration within the Mattermost platform.

github

: 157

The-Creator-AI

The Creator AI is a VS Code extension that integrates a coding assistant allowing users to choose files/folders through UI and describe code changes for AI-generated implementation plans. It requires an API key for Gemini or OpenAI. The extension follows VS Code guidelines and best practices, providing functionalities like basic chat, change plan, and file explorer. Users can edit the README using Visual Studio Code with useful keyboard shortcuts. Enjoy enhanced coding experience with The Creator AI.

github

: 132

next-ai-draw-io

Next AI Draw.io is a next.js web application that integrates AI capabilities with draw.io diagrams. It allows users to create, modify, and enhance diagrams through natural language commands and AI-assisted visualization. Features include LLM-Powered Diagram Creation, Image-Based Diagram Replication, Diagram History, Interactive Chat Interface, and Smart Editing. The application uses Next.js for frontend framework, @ai-sdk/react for chat interface and AI interactions, and react-drawio for diagram representation and manipulation. Diagrams are represented as XML that can be rendered in draw.io, with AI processing commands to generate or modify the XML accordingly.

github

: 93

cossistant

Cossistant is an open source chat support widget tailored for the React ecosystem. It offers headless components for building customizable chat interfaces, real-time messaging with WebSocket technology, and tools for managing customer conversations. The tool is API-first, self-hosted, developer-friendly with TypeScript support, and provides complete integration flexibility. It uses technologies like Next.js, TailwindCSS, and WebSockets, and supports databases like PlanetScale for production and DBgin for local development. Cossistant is ideal for developers seeking a versatile chat solution that can be easily integrated into their applications.

github

: 53

ChatterUI

ChatterUI is a mobile app that allows users to manage chat files and character cards, and to interact with Large Language Models (LLMs). It supports multiple backends, including local, koboldcpp, text-generation-webui, Generic Text Completions, AI Horde, Mancer, Open Router, and OpenAI. ChatterUI provides a mobile-friendly interface for interacting with LLMs, making it easy to use them for a variety of tasks, such as generating text, translating languages, writing code, and answering questions.

github

: 1.8k

ninja

Ninja is a project that serves as a reverse engineered proxy for ChatGPT. It allows users to acquire API keys, authenticate using email/password, proxy ChatGPT-API/OpenAI-API, access ChatGPT WebUI, utilize IP proxy pool, and solve FunCaptcha with Capsolver.com. The project has a very small memory footprint and is designed for ease of use. Please note that the project has ended.

github

: 1.7k

KeyboardGPT

Keyboard GPT is an LSPosed Module that integrates Generative AI like ChatGPT into your keyboard, allowing for real-time AI responses, custom prompts, and web search capabilities. It works in all apps and supports popular keyboards like Gboard, Swiftkey, Fleksy, and Samsung Keyboard. Users can easily configure API providers, submit prompts, and perform web searches directly from their keyboard. The tool also supports multiple Generative AI APIs such as ChatGPT, Gemini, and Groq. It offers an easy installation process for both rooted and non-rooted devices, making it a versatile and powerful tool for enhancing text input experiences on mobile devices.

github

: 142

For similar tasks

aioesphomeapi

aioesphomeapi allows you to interact with devices flashed with ESPHome. ESPHome is an open-source firmware that allows you to control your devices over Wi-Fi or Ethernet. With aioesphomeapi, you can connect to your ESPHome devices, retrieve their status, and control them from your Python code.

github

: 170

Fay

Fay is an open-source digital human framework that offers different versions for various purposes. The '带货完整版' is suitable for online and offline salespersons. The '助理完整版' serves as a human-machine interactive digital assistant that can also control devices upon command. The 'agent版' is designed to be an autonomous agent capable of making decisions and contacting its owner. The framework provides updates and improvements across its different versions, including features like emotion analysis integration, model optimizations, and compatibility enhancements. Users can access detailed documentation for each version through the provided links.

github

: 10.7k

aiohomekit

aiohomekit is a Python library that implements the HomeKit protocol for controlling HomeKit accessories using asyncio. It is primarily used with Home Assistant, targeting the same versions of Python and following their code standards. The library is still under development and does not offer API guarantees yet. It aims to match the behavior of real HAP controllers, even when not strictly specified, and works around issues like JSON formatting, boolean encoding, header sensitivity, and TCP packet splitting. aiohomekit is primarily tested with Phillips Hue and Eve Extend bridges via Home Assistant, but is known to work with many more devices. It does not support BLE accessories and is intended for client-side use only.

github

: 64

OmniSteward

github

: 66

Jarvis

Jarvis is a powerful virtual AI assistant designed to simplify daily tasks through voice command integration. It features automation, device management, and personalized interactions, transforming technology engagement. Built using Python and AI models, it serves personal and administrative needs efficiently, making processes seamless and productive.

github

: 67

TuyaOpen

TuyaOpen is an open source AI+IoT development framework supporting cross-chip platforms and operating systems. It provides core functionalities for AI+IoT development, including pairing, activation, control, and upgrading. The SDK offers robust security and compliance capabilities, meeting data compliance requirements globally. TuyaOpen enables the development of AI+IoT products that can leverage the Tuya APP ecosystem and cloud services. It continues to expand with more cloud platform integration features and capabilities like voice, video, and facial recognition.

github

: 889

aiohomematic

AIO Homematic (hahomematic) is a lightweight Python 3 library for controlling and monitoring HomeMatic and HomematicIP devices, with support for third-party devices/gateways. It automatically creates entities for device parameters, offers custom entity classes for complex behavior, and includes features like caching paramsets for faster restarts. Designed to integrate with Home Assistant, it requires specific firmware versions for HomematicIP devices. The public API is defined in modules like central, client, model, exceptions, and const, with example usage provided. Useful links include changelog, data point definitions, troubleshooting, and developer resources for architecture, data flow, model extension, and Home Assistant lifecycle.

github

: 157

vision-agent

AskUI Vision Agent is a powerful automation framework that enables you and AI agents to control your desktop, mobile, and HMI devices and automate tasks. It supports multiple AI models, multi-platform compatibility, and enterprise-ready features. The tool provides support for Windows, Linux, MacOS, Android, and iOS device automation, single-step UI automation commands, in-background automation on Windows machines, flexible model use, and secure deployment of agents in enterprise environments.

github

: 381

For similar jobs

zep

Zep is a long-term memory service for AI Assistant apps. With Zep, you can provide AI assistants with the ability to recall past conversations, no matter how distant, while also reducing hallucinations, latency, and cost. Zep persists and recalls chat histories, and automatically generates summaries and other artifacts from these chat histories. It also embeds messages and summaries, enabling you to search Zep for relevant context from past conversations. Zep does all of this asyncronously, ensuring these operations don't impact your user's chat experience. Data is persisted to database, allowing you to scale out when growth demands. Zep also provides a simple, easy to use abstraction for document vector search called Document Collections. This is designed to complement Zep's core memory features, but is not designed to be a general purpose vector database. Zep allows you to be more intentional about constructing your prompt: 1. automatically adding a few recent messages, with the number customized for your app; 2. a summary of recent conversations prior to the messages above; 3. and/or contextually relevant summaries or messages surfaced from the entire chat session. 4. and/or relevant Business data from Zep Document Collections.

github

: 2.4k

doc2plan

doc2plan is a browser-based application that helps users create personalized learning plans by extracting content from documents. It features a Creator for manual or AI-assisted plan construction and a Viewer for interactive plan navigation. Users can extract chapters, key topics, generate quizzes, and track progress. The application includes AI-driven content extraction, quiz generation, progress tracking, plan import/export, assistant management, customizable settings, viewer chat with text-to-speech and speech-to-text support, and integration with various Retrieval-Augmented Generation (RAG) models. It aims to simplify the creation of comprehensive learning modules tailored to individual needs.

github

: 138

whatsapp-chatgpt

This repository contains a WhatsApp bot that utilizes OpenAI's GPT and DALL-E 2 to respond to user inputs. Users can interact with the bot through voice messages, which are transcribed and responded to. The bot requires Node.js, npm, an OpenAI API key, and a WhatsApp account. It uses Puppeteer to run a real instance of Whatsapp Web to avoid being blocked. However, there is a risk of being blocked by WhatsApp as it does not allow bots or unofficial clients on its platform. The bot is not free to use, and users will be charged by OpenAI for each request made.

github

: 3.4k

OmniSteward

github

: 66

chatgpt-wechat

ChatGPT-WeChat is a personal assistant application that can be safely used on WeChat through enterprise WeChat without the risk of being banned. The project is open source and free, with no paid sections or external traffic operations except for advertising on the author's public account '积木成楼'. It supports various features such as secure usage on WeChat, multi-channel customer service message integration, proxy support, session management, rapid message response, voice and image messaging, drawing capabilities, private data storage, plugin support, and more. Users can also develop their own capabilities following the rules provided. The project is currently in development with stable versions available for use.

github

: 996

mcp-agent

mcp-agent is a simple, composable framework designed to build agents using the Model Context Protocol. It handles the lifecycle of MCP server connections and implements patterns for building production-ready AI agents in a composable way. The framework also includes OpenAI's Swarm pattern for multi-agent orchestration in a model-agnostic manner, making it the simplest way to build robust agent applications. It is purpose-built for the shared protocol MCP, lightweight, and closer to an agent pattern library than a framework. mcp-agent allows developers to focus on the core business logic of their AI applications by handling mechanics such as server connections, working with LLMs, and supporting external signals like human input.

github

: 7.4k

Gmail-MCP-Server

Gmail AutoAuth MCP Server is a Model Context Protocol (MCP) server designed for Gmail integration in Claude Desktop. It supports auto authentication and enables AI assistants to manage Gmail through natural language interactions. The server provides comprehensive features for sending emails, reading messages, managing labels, searching emails, and batch operations. It offers full support for international characters, email attachments, and Gmail API integration. Users can install and authenticate the server via Smithery or manually with Google Cloud Project credentials. The server supports both Desktop and Web application credentials, with global credential storage for convenience. It also includes Docker support and instructions for cloud server authentication.

github

: 126

Operit

Operit AI is a fully functional AI assistant application for mobile devices, running independently on Android devices with powerful tool invocation capabilities. It offers over 40 built-in tools for file system operations, HTTP requests, system operations, UI automation, and media processing. The app combines these tools with rich plugins to enable a wide range of tasks, from simple to complex, providing a comprehensive experience of a smartphone AI assistant.

github

: 1.7k