OmniSteward
🐼基于LLM Agent的全能管家,通过语音或文字交互,调用工具控制智能家居(HA/米家)和电脑。超高拓展性,无限可能。
Stars: 66
OmniSteward is an AI-powered steward system based on large language models that can interact with users through voice or text to help control smart home devices and computer programs. It supports multi-turn dialogue, tool calling for complex tasks, multiple LLM models, voice recognition, smart home control, computer program management, online information retrieval, command line operations, and file management. The system is highly extensible, allowing users to customize and share their own tools.
README:
Note: This project is still under active development, and some features may be unstable, please use with caution
The English README is automatically generated, please refer to the Chinese version for the most accurate information
This is an AI-powered steward system based on large language models that can interact with users through voice or text to help control smart home devices and computer programs.
- 2024-12-18: Added support for HomeAssistant, can now control HomeAssistant/Mi Home devices, check omni-ha for more details
- Supports multi-turn dialogue for continuous user interaction
- Supports tool calling to execute complex tasks on your computer
- Supports multiple LLM models that can be switched as needed
- Highly extensible - you can easily customize and share your own tools
- 🎤 Voice recognition and interaction
- 🏠 Smart home control (HomeAssistant/Bemfa devices/Mi Home devices)
- 💻 Computer program management (start/stop programs)
- 🔍 Online information retrieval (via Stepfun Web Search or Kimi AI)
- ⌨️ Command line operations
- 📂 File management (file search/read/write/compress/list directory)
We prepared a series of demo videos, please watch demo videos to understand the main features and usage of the system.
- Python 3.8+
- Chrome browser (for Kimi AI functionality)
- Windows OS (some features only support Windows, Linux and Mac untested)
- Clone repository
git clone https://github.com/OmniSteward/OmniSteward.git
cd OmniSteward
- Install dependencies
pip install -r requirements.txt
See examples/env.cmd file
OPENAI_API_BASE=your_api_base # OpenAI format API base URL
OPENAI_API_KEY=your_api_key # OpenAI format API key
SILICON_FLOW_API_KEY=your_api_key # Silicon Flow API key for ASR, Rerank, see [LLM Platforms](docs/PLATFORM.md)
BEMFA_UID=your_bemfa_uid # Bemfa platform UID (optional, for smart home control)
BEMFA_TOPIC=your_bemfa_topic # Bemfa platform Topic (optional, for smart home control)
KIMI_PROFILE_PATH=path_to_chrome_profile # Chrome user data directory (optional, for Kimi AI, uses default path if not set)
LOCATION=your_location # Your geographic location (optional, for system prompts)
LLM_MODEL=your_llm_model # LLM model to use, optional, defaults to Qwen2.5-7B-Instruct
For obtaining OpenAI format API key and base URL, see LLM Platforms
Reference links:
This project supports two usage modes:
- Command Line Interface (CLI): Interact through command line, direct usage.
- Web Mode: Requires frontend project, interact through WebUI, can be used remotely on phone, tablet, computer to manage smart home devices
Please first configure environment variables in examples/env.cmd
file (see Environment Variables Configuration)
First start the VAD service:
python -m servers.vad_rpc
Then open a new command prompt window and run:
call examples\env.cmd # Apply environment variables
python -m core.cli --config configs/cli.py # Run CLI
See examples/cli_voice.cmd for more details
call examples\env.cmd # Apply environment variables
python -m core.cli --query "open NetEase Music" --config configs/cli.py
call examples\env.cmd # Apply environment variables
python -m core.cli --query "print hello" --config configs/cli_custom_tool.py
This example adds a simple print tool in configs/cli_custom_tool.py that can print any string. Check this file to learn how to easily add custom tools
- Requires frontend WebUI, called OmniSteward-Frontend
- Environment variables must be configured, especially Silicon Flow API key
- Frontend WebUI should run on
http://localhost:3000
, backend will forward requests to frontend when started - Backend service should run on
http://localhost:8000
Please first configure environment variables in examples/env.cmd
file (see Environment Variables Configuration), then run in project root:
call examples\env.cmd # Apply environment variables
python -m servers.steward --config configs/backend.py
See OmniSteward-Frontend project.
Use Chrome/Edge browser, open http://localhost:8000
to start using.
Note: For external network access, since Chrome/Edge blocks microphone under HTTP by default, we need to set chrome://flags/#unsafely-treat-insecure-origin-as-secure
to http://ip:port
, otherwise it cannot be used. See tutorial for reference.
Mobile phones can also use Chrome or Edge browser, open http://ip:port
to start using, requires same settings as above.
See TOOL_LIST.md
- Some features require specific API keys and environment configuration
- Command line tools require user confirmation before execution
- Smart home control features require corresponding hardware support
Currently this project is maintained by ElliottZheng, welcome to submit issues and pull requests!
Thanks to Stepfun Stars Program for supporting this project.
Copyright (c) 2024-present ElliottZheng
See steward-utils project for more custom tool examples.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for OmniSteward
Similar Open Source Tools
OmniSteward
OmniSteward is an AI-powered steward system based on large language models that can interact with users through voice or text to help control smart home devices and computer programs. It supports multi-turn dialogue, tool calling for complex tasks, multiple LLM models, voice recognition, smart home control, computer program management, online information retrieval, command line operations, and file management. The system is highly extensible, allowing users to customize and share their own tools.
SecureAI-Tools
SecureAI Tools is a private and secure AI tool that allows users to chat with AI models, chat with documents (PDFs), and run AI models locally. It comes with built-in authentication and user management, making it suitable for family members or coworkers. The tool is self-hosting optimized and provides necessary scripts and docker-compose files for easy setup in under 5 minutes. Users can customize the tool by editing the .env file and enabling GPU support for faster inference. SecureAI Tools also supports remote OpenAI-compatible APIs, with lower hardware requirements for using remote APIs only. The tool's features wishlist includes chat sharing, mobile-friendly UI, and support for more file types and markdown rendering.
openroleplay.ai
Open Roleplay is an open-source alternative to Character.ai. It allows users to create their own AI characters, customize them, and generate images and voices for them. Open Roleplay also supports group chat and automatic translation. The tool is built with Next.js, React.js, Tailwind CSS, Vercel, Convex, and Clerk.
ai-flow
AI Flow is an open-source, user-friendly UI application that empowers you to seamlessly connect multiple AI models together, specifically leveraging the capabilities of multiples AI APIs such as OpenAI, StabilityAI and Replicate. In a nutshell, AI Flow provides a visual platform for crafting and managing AI-driven workflows, thereby facilitating diverse and dynamic AI interactions.
E2B
E2B Sandbox is a secure sandboxed cloud environment made for AI agents and AI apps. Sandboxes allow AI agents and apps to have long running cloud secure environments. In these environments, large language models can use the same tools as humans do. For example: * Cloud browsers * GitHub repositories and CLIs * Coding tools like linters, autocomplete, "go-to defintion" * Running LLM generated code * Audio & video editing The E2B sandbox can be connected to any LLM and any AI agent or app.
file-organizer-2000
AI File Organizer 2000 is an Obsidian Plugin that uses AI to transcribe audio, annotate images, and automatically organize files by moving them to the most likely folders. It supports text, audio, and images, with upcoming local-first LLM support. Users can simply place unorganized files into the 'Inbox' folder for automatic organization. The tool renames and moves files quickly, providing a seamless file organization experience. Self-hosting is also possible by running the server and enabling the 'Self-hosted' option in the plugin settings. Join the community Discord server for more information and use the provided iOS shortcut for easy access on mobile devices.
ninja
Ninja is a project that serves as a reverse engineered proxy for ChatGPT. It allows users to acquire API keys, authenticate using email/password, proxy ChatGPT-API/OpenAI-API, access ChatGPT WebUI, utilize IP proxy pool, and solve FunCaptcha with Capsolver.com. The project has a very small memory footprint and is designed for ease of use. Please note that the project has ended.
tlm
tlm is a local CLI copilot tool powered by CodeLLaMa, providing efficient command line suggestions without the need for an API key or internet connection. It works on macOS, Linux, and Windows, with automatic shell detection for Powershell, Bash, and Zsh. The tool offers one-liner generation and command explanation, and can be installed via an installation script or using Go Install. Ollama is required to download necessary models, and the tool can be easily deployed and configured. Contributors are welcome to enhance the tool's functionality.
tgpt
tgpt is a cross-platform command-line interface (CLI) tool that allows users to interact with AI chatbots in the Terminal without needing API keys. It supports various AI providers such as KoboldAI, Phind, Llama2, Blackbox AI, and OpenAI. Users can generate text, code, and images using different flags and options. The tool can be installed on GNU/Linux, MacOS, FreeBSD, and Windows systems. It also supports proxy configurations and provides options for updating and uninstalling the tool.
Auto-Gmail-Creator
Auto-Gmail-Creator is an open-source automation script designed for Python enthusiasts to learn automation basics and for marketers to create multiple Google accounts efficiently. The script automates the process of creating Gmail accounts using sms-activate.org API for phone verification. It handles the download of Chromedriver or Geckodriver automatically and can be customized to prevent blocking. The tool is useful for projects related to automation, scraping, and machine learning.
kubeai
KubeAI is a highly scalable AI platform that runs on Kubernetes, serving as a drop-in replacement for OpenAI with API compatibility. It can operate OSS model servers like vLLM and Ollama, with zero dependencies and additional OSS addons included. Users can configure models via Kubernetes Custom Resources and interact with models through a chat UI. KubeAI supports serving various models like Llama v3.1, Gemma2, and Qwen2, and has plans for model caching, LoRA finetuning, and image generation.
Chital
Chital is a native macOS app designed for chatting with Ollama models. It offers low memory usage and fast app launch times, supports multiple chat threads, allows users to switch between different models, provides Markdown support, and automatically summarizes chat thread titles. The app requires macOS 14 Sonoma or above, the installation of Ollama, and at least one downloaded LLM model. Chital is a user-friendly tool that simplifies the process of engaging with Ollama models through chat threads on macOS systems.
job-llm
ResumeFlow is an automated system utilizing Large Language Models (LLMs) to streamline the job application process. It aims to reduce human effort in various steps of job hunting by integrating LLM technology. Users can access ResumeFlow as a web tool, install it as a Python package, or download the source code. The project focuses on leveraging LLMs to automate tasks such as resume generation and refinement, making job applications smoother and more efficient.
inspector-laravel
Inspector is a code execution monitoring tool specifically designed for Laravel applications. It provides simple and efficient monitoring capabilities to track and analyze the performance of your Laravel code. With Inspector, you can easily monitor web requests, test the functionality of your application, and explore data through a user-friendly dashboard. The tool requires PHP version 7.2.0 or higher and Laravel version 5.5 or above. By configuring the ingestion key and attaching the middleware, users can seamlessly integrate Inspector into their Laravel projects. The official documentation provides detailed instructions on installation, configuration, and usage of Inspector. Contributions to the tool are welcome, and users are encouraged to follow the Contribution Guidelines to participate in the development of Inspector.
OpenAdapt
OpenAdapt is an open-source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web Graphical User Interfaces (GUIs). It aims to automate repetitive GUI workflows by leveraging the power of LMMs. OpenAdapt records user input and screenshots, converts them into tokenized format, and generates synthetic input via transformer model completions. It also analyzes recordings to generate task trees and replay synthetic input to complete tasks. OpenAdapt is model agnostic and generates prompts automatically by learning from human demonstration, ensuring that agents are grounded in existing processes and mitigating hallucinations. It works with all types of desktop GUIs, including virtualized and web, and is open source under the MIT license.
Protofy
Protofy is a full-stack, batteries-included low-code enabled web/app and IoT system with an API system and real-time messaging. It is based on Protofy (protoflow + visualui + protolib + protodevices) + Expo + Next.js + Tamagui + Solito + Express + Aedes + Redbird + Many other amazing packages. Protofy can be used to fast prototype Apps, webs, IoT systems, automations, or APIs. It is a ultra-extensible CMS with supercharged capabilities, mobile support, and IoT support (esp32 thanks to esphome).
For similar tasks
aioesphomeapi
aioesphomeapi allows you to interact with devices flashed with ESPHome. ESPHome is an open-source firmware that allows you to control your devices over Wi-Fi or Ethernet. With aioesphomeapi, you can connect to your ESPHome devices, retrieve their status, and control them from your Python code.
Fay
Fay is an open-source digital human framework that offers different versions for various purposes. The '带货完整版' is suitable for online and offline salespersons. The '助理完整版' serves as a human-machine interactive digital assistant that can also control devices upon command. The 'agent版' is designed to be an autonomous agent capable of making decisions and contacting its owner. The framework provides updates and improvements across its different versions, including features like emotion analysis integration, model optimizations, and compatibility enhancements. Users can access detailed documentation for each version through the provided links.
aiohomekit
aiohomekit is a Python library that implements the HomeKit protocol for controlling HomeKit accessories using asyncio. It is primarily used with Home Assistant, targeting the same versions of Python and following their code standards. The library is still under development and does not offer API guarantees yet. It aims to match the behavior of real HAP controllers, even when not strictly specified, and works around issues like JSON formatting, boolean encoding, header sensitivity, and TCP packet splitting. aiohomekit is primarily tested with Phillips Hue and Eve Extend bridges via Home Assistant, but is known to work with many more devices. It does not support BLE accessories and is intended for client-side use only.
OmniSteward
OmniSteward is an AI-powered steward system based on large language models that can interact with users through voice or text to help control smart home devices and computer programs. It supports multi-turn dialogue, tool calling for complex tasks, multiple LLM models, voice recognition, smart home control, computer program management, online information retrieval, command line operations, and file management. The system is highly extensible, allowing users to customize and share their own tools.
discollama
Discollama is a Discord bot powered by a local large language model backed by Ollama. It allows users to interact with the bot in Discord by mentioning it in a message to start a new conversation or in a reply to a previous response to continue an ongoing conversation. The bot requires Docker and Docker Compose to run, and users need to set up a Discord Bot and environment variable DISCORD_TOKEN before using discollama.py. Additionally, an Ollama server is needed, and users can customize the bot's personality by creating a custom model using Modelfile and running 'ollama create'.
ChatGPT-OpenAI-Smart-Speaker
ChatGPT Smart Speaker is a project that enables speech recognition and text-to-speech functionalities using OpenAI and Google Speech Recognition. It provides scripts for running on PC/Mac and Raspberry Pi, allowing users to interact with a smart speaker setup. The project includes detailed instructions for setting up the required hardware and software dependencies, along with customization options for the OpenAI model engine, language settings, and response randomness control. The Raspberry Pi setup involves utilizing the ReSpeaker hardware for voice feedback and light shows. The project aims to offer an advanced smart speaker experience with features like wake word detection and response generation using AI models.
bot-on-anything
The 'bot-on-anything' repository allows developers to integrate various AI models into messaging applications, enabling the creation of intelligent chatbots. By configuring the connections between models and applications, developers can easily switch between multiple channels within a project. The architecture is highly scalable, allowing the reuse of algorithmic capabilities for each new application and model integration. Supported models include ChatGPT, GPT-3.0, New Bing, and Google Bard, while supported applications range from terminals and web platforms to messaging apps like WeChat, Telegram, QQ, and more. The repository provides detailed instructions for setting up the environment, configuring the models and channels, and running the chatbot for various tasks across different messaging platforms.
agentlang
AgentLang is an open-source programming language and framework designed for solving complex tasks with the help of AI agents. It allows users to build business applications rapidly from high-level specifications, making it more efficient than traditional programming languages. The language is data-oriented and declarative, with a syntax that is intuitive and closer to natural languages. AgentLang introduces innovative concepts such as first-class AI agents, graph-based hierarchical data model, zero-trust programming, declarative dataflow, resolvers, interceptors, and entity-graph-database mapping.
For similar jobs
zep
Zep is a long-term memory service for AI Assistant apps. With Zep, you can provide AI assistants with the ability to recall past conversations, no matter how distant, while also reducing hallucinations, latency, and cost. Zep persists and recalls chat histories, and automatically generates summaries and other artifacts from these chat histories. It also embeds messages and summaries, enabling you to search Zep for relevant context from past conversations. Zep does all of this asyncronously, ensuring these operations don't impact your user's chat experience. Data is persisted to database, allowing you to scale out when growth demands. Zep also provides a simple, easy to use abstraction for document vector search called Document Collections. This is designed to complement Zep's core memory features, but is not designed to be a general purpose vector database. Zep allows you to be more intentional about constructing your prompt: 1. automatically adding a few recent messages, with the number customized for your app; 2. a summary of recent conversations prior to the messages above; 3. and/or contextually relevant summaries or messages surfaced from the entire chat session. 4. and/or relevant Business data from Zep Document Collections.
doc2plan
doc2plan is a browser-based application that helps users create personalized learning plans by extracting content from documents. It features a Creator for manual or AI-assisted plan construction and a Viewer for interactive plan navigation. Users can extract chapters, key topics, generate quizzes, and track progress. The application includes AI-driven content extraction, quiz generation, progress tracking, plan import/export, assistant management, customizable settings, viewer chat with text-to-speech and speech-to-text support, and integration with various Retrieval-Augmented Generation (RAG) models. It aims to simplify the creation of comprehensive learning modules tailored to individual needs.
whatsapp-chatgpt
This repository contains a WhatsApp bot that utilizes OpenAI's GPT and DALL-E 2 to respond to user inputs. Users can interact with the bot through voice messages, which are transcribed and responded to. The bot requires Node.js, npm, an OpenAI API key, and a WhatsApp account. It uses Puppeteer to run a real instance of Whatsapp Web to avoid being blocked. However, there is a risk of being blocked by WhatsApp as it does not allow bots or unofficial clients on its platform. The bot is not free to use, and users will be charged by OpenAI for each request made.
OmniSteward
OmniSteward is an AI-powered steward system based on large language models that can interact with users through voice or text to help control smart home devices and computer programs. It supports multi-turn dialogue, tool calling for complex tasks, multiple LLM models, voice recognition, smart home control, computer program management, online information retrieval, command line operations, and file management. The system is highly extensible, allowing users to customize and share their own tools.
yet-another-applied-llm-benchmark
Yet Another Applied LLM Benchmark is a collection of diverse tests designed to evaluate the capabilities of language models in performing real-world tasks. The benchmark includes tests such as converting code, decompiling bytecode, explaining minified JavaScript, identifying encoding formats, writing parsers, and generating SQL queries. It features a dataflow domain-specific language for easily adding new tests and has nearly 100 tests based on actual scenarios encountered when working with language models. The benchmark aims to assess whether models can effectively handle tasks that users genuinely care about.
BadukMegapack
BadukMegapack is an installer for various AI Baduk (Go) programs, designed for baduk players who want to easily access and use a variety of baduk AI programs without complex installations. The megapack includes popular programs like Lizzie, KaTrain, Sabaki, KataGo, LeelaZero, and more, along with weight files for different AI models. Users can update their graphics card drivers before installation for optimal performance.
Halite-III
Halite III is an AI programming competition hosted by Two Sigma. Contestants write bots to play a turn-based strategy game on a square grid. Bots navigate the sea collecting halite in this resource management game. The competition offers players the opportunity to develop bots with various strategies and in multiple programming languages.
chroma
Chroma is an open-source embedding database that provides a simple, scalable, and feature-rich way to build Python or JavaScript LLM apps with memory. It offers a fully-typed, fully-tested, and fully-documented API that makes it easy to get started and scale your applications. Chroma also integrates with popular tools like LangChain and LlamaIndex, and supports a variety of embedding models, including Sentence Transformers, OpenAI embeddings, and Cohere embeddings. With Chroma, you can easily add documents to your database, query relevant documents with natural language, and compose documents into the context window of an LLM like GPT3 for additional summarization or analysis.