airunner
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
Stars: 1241
AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.
README:
|
π Report Bug |
β¨ Request Feature |
π‘οΈ Report Vulnerability |
π‘οΈ Wiki |
Show your support for this project by choosing one of the following options for donations.
- Crypto: 0x02030569e866e22C9991f55Db0445eeAd2d646c8
- Github Sponsors: https://github.com/sponsors/w4ffl35
- Patreon: https://www.patreon.com/c/w4ffl35
βοΈ Get notified when the packaged version releases
| β¨ Key Features |
|---|
| π£οΈ Real-time conversations |
| - Three speech engines: espeak, SpeechT5, OpenVoice - Auto language detection (OpenVoice) - Real-time voice-chat with LLMs |
| π€ Customizable AI Agents |
| - Custom agent names, moods, personalities - Retrieval-Augmented Generation (RAG) - Create AI personalities and moods |
| π Enhanced Knowledge Retrieval |
| - RAG for documents/websites - Use local data to enrich chat |
| πΌοΈ Image Generation & Manipulation |
| - Text-to-Image (Stable Diffusion 1.5, SDXL, Turbo) - Drawing tools & ControlNet - LoRA & Embeddings - Inpainting, outpainting, filters |
| π Multi-lingual Capabilities |
| - Partial multi-lingual TTS/STT/interface - English & Japanese GUI |
| π Privacy and Security |
| - Runs locally, no external API (default) - Customizable LLM guardrails & image safety - Disables HuggingFace telemetry - Restricts network access |
| β‘ Performance & Utility |
| - Fast generation (~2s on RTX 2080s) - Docker-based setup & GPU acceleration - Theming (Light/Dark/System) - NSFW toggles - Extension API - Python library & API support |
| Language | TTS | LLM | STT | GUI |
|---|---|---|---|---|
| English | β | β | β | β |
| Japanese | β | β | β | β |
| Spanish | β | β | β | β |
| French | β | β | β | β |
| Chinese | β | β | β | β |
| Korean | β | β | β | β |
AI Runner is a powerful tool designed for local, private use. However, its capabilities mean that users must be aware of their responsibilities under emerging AI regulations. This section provides information regarding the Colorado AI Act.
As the developer of AI Runner, we have a duty of care to inform our users about how this law may apply to them.
- Your Role as a User: If you use AI Runner to make, or as a substantial factor in making, an important decision that has a legal or similarly significant effect on someone's life, you may be considered a "deployer" of a "high-risk AI system" under Colorado law.
- What is a "High-Risk" Use Case? Examples of high-risk decisions include using AI to screen job applicants, evaluate eligibility for loans, housing, insurance, or other essential services.
- User Responsibility: Given AI Runner's customizable nature (e.g., using RAG with personal or business documents), it is possible to configure it for such high-risk purposes. If you do so, you are responsible for complying with the obligations of a "deployer," which include performing impact assessments and preventing algorithmic discrimination.
- Our Commitment: We are committed to developing AI Runner responsibly. The built-in privacy features, local-first design, and configurable guardrails are intended to provide you with the tools to use AI safely. We strongly encourage you to understand the capabilities and limitations of the AI models you choose to use and to consider the ethical implications of your specific application.
For more information, we recommend reviewing the text of the Colorado AI Act.
| Specification | Minimum | Recommended |
|---|---|---|
| OS | Ubuntu 22.04, Windows 10 | Ubuntu 22.04 (Wayland) |
| CPU | Ryzen 2700K or Intel Core i7-8700K | Ryzen 5800X or Intel Core i7-11700K |
| Memory | 16 GB RAM | 32 GB RAM |
| GPU | NVIDIA RTX 3060 or better | NVIDIA RTX 4090 or better |
| Network | Broadband (used to download models) | Broadband (used to download models) |
| Storage | 22 GB (with models), 6 GB (without models) | 100 GB or higher |
-
Install system requirements
sudo apt update && sudo apt upgrade -y sudo apt install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl git nvidia-cuda-toolkit pipewire libportaudio2 libxcb-cursor0 gnupg gpg-agent pinentry-curses espeak xclip cmake qt6-qpa-plugins qt6-wayland qt6-gtk-platformtheme mecab libmecab-dev mecab-ipadic-utf8 libxslt-dev mkcert sudo apt install espeak sudo apt install espeak-ng-espeak -
Create
airunnerdirectorysudo mkdir ~/.local/share/airunner sudo chown $USER:$USER ~/.local/share/airunner
-
Install AI Runner - Python 3.13+ required
pyenvandvenvare recommended (see wiki for more info)pip install "typing-extensions==4.13.2" pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 pip install airunner[all_dev] -
Run AI Runner
airunner
For more options, including Docker, see the Installation Wiki.
-
Run AI Runner:
airunner -
Run the downloader:
airunner-setup -
Build templates:
airunner-build-ui
|
These are the sizes of the optional models that power AI Runner.
AI Runner uses the following stack
|
By default, AI Runner installs essential TTS/STT and minimal LLM components, but AI art models must be supplied by the user. Organize them under your local AI Runner data directory:
|
- The chatbot's mood and conversation summary system is always enabled by default. The bot's mood and emoji are shown with each bot message.
- When the LLM is updating the bot's mood or summarizing the conversation, a loading spinner and status message are shown in the chat prompt widget. The indicator disappears as soon as a new message arrives.
- This system is automatic and requires no user configuration.
- For more details, see the LLM Chat Prompt Widget README.
- The mood and summary engines are now fully integrated into the agent runtime. When the agent updates mood or summarizes the conversation, it emits a signal to the UI with a customizable loading message. The chat prompt widget displays this message as a loading indicator.
- See
src/airunner/handlers/llm/agent/agents/base.pyfor integration details andsrc/airunner/api/chatbot_services.pyfor the API function.
AI Runner includes an Aggregated Search Tool for querying multiple online services from a unified interface. This tool is available as a NodeGraphQt node, an LLM agent tool, and as a Python API.
Supported Search Services:
- DuckDuckGo (no API key required)
- Wikipedia (no API key required)
- arXiv (no API key required)
- Google Custom Search (requires
GOOGLE_API_KEYandGOOGLE_CSE_ID) - Bing Web Search (requires
BING_SUBSCRIPTION_KEY) - NewsAPI (requires
NEWSAPI_KEY) - StackExchange (optional
STACKEXCHANGE_KEYfor higher quota) - GitHub Repositories (optional
GITHUB_TOKENfor higher rate limits) - OpenLibrary (no API key required)
API Key Setup:
- Set the required API keys as environment variables before running AI Runner. Only services with valid keys will be queried.
- Example:
export GOOGLE_API_KEY=your_google_api_key export GOOGLE_CSE_ID=your_google_cse_id export BING_SUBSCRIPTION_KEY=your_bing_key export NEWSAPI_KEY=your_newsapi_key export STACKEXCHANGE_KEY=your_stackexchange_key export GITHUB_TOKEN=your_github_token
Usage:
- Use the Aggregated Search node in NodeGraphQt for visual workflows.
- Call the tool from LLM agents or Python code:
from airunner.components.tools import AggregatedSearchTool results = await AggregatedSearchTool.aggregated_search("python", category="web")
- See
src/airunner/tools/README.mdfor more details.
Note:
- DuckDuckGo, Wikipedia, arXiv, and OpenLibrary do not require API keys and can be used out-of-the-box.
- For best results and full service coverage, configure all relevant API keys.
AI Runner's local server enforces HTTPS-only operation for all local resources. HTTP is never used or allowed for local static assets or API endpoints. At startup, the server logs explicit details about HTTPS mode and the certificate/key in use. Security headers are set and only GET/HEAD methods are allowed for further hardening.
-
Automatic Certificate Generation (Recommended):
- By default, AI Runner will auto-generate a self-signed certificate in
~/.local/share/airunner/certs/if one does not exist. No manual steps are required for most users. - If you want to provide your own certificate, place
cert.pemandkey.pemin thecertsdirectory under your AI Runner base path.
- By default, AI Runner will auto-generate a self-signed certificate in
-
Manual Certificate Generation (Optional):
- You can manually generate a self-signed certificate with:
airunner-generate-cert
- This will create
cert.pemandkey.pemin your current directory. Move them to your AI Runner certs directory if you want to use them.
- You can manually generate a self-signed certificate with:
-
Configure AI Runner to Use SSL:
- The app will automatically use the certificates in the certs directory. If you want to override, set the environment variables:
export AIRUNNER_SSL_CERT=~/path/to/cert.pem export AIRUNNER_SSL_KEY=~/path/to/key.pem airunner
- The server will use HTTPS if both files are provided.
- The app will automatically use the certificates in the certs directory. If you want to override, set the environment variables:
-
Access the App via
https://localhost:<port>- The default port is 5005 (configurable in
src/airunner/settings.py). - Your browser may warn about the self-signed certificate; you can safely bypass this for local development.
- The default port is 5005 (configurable in
- For production or remote access, use a certificate from a trusted CA.
- Never share your private key (
key.pem). - The server only binds to
127.0.0.1by default for safety. - For additional hardening, see the Security guide and the code comments in
local_http_server.py.
You can generate a self-signed SSL certificate for local HTTPS with a single command:
airunner-generate-certThis will create cert.pem and key.pem in your current directory. Use these files with the local HTTP server as described above.
See the SSL/TLS section for full details.
- For a browser-trusted local HTTPS experience (no warnings), install mkcert:
# On Ubuntu/Debian: sudo apt install libnss3-tools brew install mkcert # (on macOS, or use your package manager) mkcert -install
- If
mkcertis not installed, AI Runner will fall back to OpenSSL self-signed certificates, which will show browser warnings. - See the SSL/TLS section for details.
AI Runner provides several CLI commands for development, testing, and maintenance. Below is a summary of all available commands:
| Command | Description |
|---|---|
airunner |
Launch the AI Runner application GUI. |
airunner-setup |
Download and set up required models and data. |
airunner-build-ui |
Regenerate Python UI files from .ui templates. Run after editing any .ui file. |
airunner-compile-translations |
Compile translation files for internationalization. |
airunner-tests |
Run the full test suite using pytest. |
airunner-test-coverage-report |
Generate a test coverage report. |
airunner-docker |
Run Docker-related build and management commands for AI Runner. |
airunner-generate-migration |
Generate a new Alembic database migration. |
airunner-generate-cert |
Generate a self-signed SSL certificate for local HTTPS. |
airunner-mypy <filename> |
Run mypy type checking on a file with project-recommended flags. |
Usage Examples:
# Launch the app
airunner
# Download models and set up data
airunner-setup
# Build UI Python files from .ui templates
airunner-build-ui
# Compile translation files
airunner-compile-translations
# Run all tests
airunner-tests
# Generate a test coverage report
airunner-test-coverage-report
# Run Docker build or management tasks
airunner-docker
# Generate a new Alembic migration
airunner-generate-migration
# Generate a self-signed SSL certificate
airunner-generate-cert
# Run mypy type checking on a file
airunner-mypy src/airunner/components/document_editor/gui/widgets/document_editor_widget.pyFor more details on each command, see the Wiki or run the command with --help if supported.
AI Runner supports a set of powerful chat slash commands, known as Slash Tools, that let you quickly trigger special actions, tools, or workflows directly from the chat prompt. These commands start with a / and can be used in any chat conversation.
- Type
/in the chat prompt to see available commands (autocomplete is supported in the UI). - Each slash command maps to a specific tool, agent action, or workflow.
- The set of available commands is extensible and may include custom or extension-provided tools.
| Slash | Command | Action Type | Description |
|---|---|---|---|
/a |
Image | GENERATE_IMAGE | Generate an image from a prompt |
/c |
Code | CODE | Run or generate code (if supported) |
/s |
Search | SEARCH | Search the web or knowledge base |
/w |
Workflow | WORKFLOW | Run a custom workflow (if supported) |
Note:
- Some slash tools (like
/afor image) return an immediate confirmation message (e.g., "Ok, I've navigated to ...", "Ok, generating your image..."). - Others (like
/sfor search or/wfor workflow) do not return a direct message, but instead show a loading indicator until the result is ready. - The set of available slash commands is defined in
SLASH_COMMANDSinsrc/airunner/settings.pyand may be extended in the future.
For a full list of supported slash commands, type /help in the chat prompt or see the copilot-instructions.md.
We welcome pull requests for new features, bug fixes, or documentation improvements. You can also build and share extensions to expand AI Runnerβs functionality. For details, see the Extensions Wiki.
Take a look at the Contributing document and the Development wiki page for detailed instructions.
AI Runner uses pytest for all automated testing. Test coverage is a priority, especially for utility modules.
-
Headless-safe tests:
- Located in
src/airunner/utils/tests/ - Can be run in any environment (including CI, headless servers, and developer machines)
- Run with:
pytest src/airunner/utils/tests/
- Located in
-
Display-required (Qt/Xvfb) tests:
- Located in
src/airunner/utils/tests/xvfb_required/ - Require a real Qt display environment (cannot be run headlessly or with
pytest-qt) - Typical for low-level Qt worker/signal/slot logic
- Run with:
xvfb-run -a pytest src/airunner/utils/tests/xvfb_required/ # Or for a single file: xvfb-run -a pytest src/airunner/utils/tests/xvfb_required/test_background_worker.py - See the README in xvfb_required/ for details.
- Located in
- By default, only headless-safe tests are run in CI.
- Display-required tests are intended for manual or special-case runs (e.g., when working on Qt threading or background worker code).
- (Optional) You may automate this split in CI by adding a separate job/step for xvfb tests.
- All new utility code must be accompanied by tests.
- Use
pytest,pytest-qt(for GUI), andunittest.mockfor mocking dependencies. - For more details on writing and organizing tests, see the project coding guidelines and the
src/airunner/utils/tests/folder.
- Follow the copilot-instructions.md for all development, testing, and contribution guidelines.
- Always use the
airunnercommand in the terminal to run the application. - Always run tests in the terminal (not in the workspace test runner).
- Use
pytestandpytest-covfor running tests and checking coverage. - UI changes must be made in
.uifiles and rebuilt withairunner-build-ui.
- See the Wiki for architecture, usage, and advanced topics.
- API Service Layer
- Main Window Model Load Balancer
- Facehugger Shield Suite
- NodeGraphQt Vendor Module
- Xvfb-Required Tests
- ORM Models
For additional details, see the Wiki.
If you find this project useful, please consider sponsoring its development. Your support helps cover the costs of infrastructure, development, and maintenance.
You can sponsor the project on GitHub Sponsors.
Thank you for your support!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for airunner
Similar Open Source Tools
airunner
AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.
seline
Seline is a local-first AI desktop application that integrates conversational AI, visual generation tools, vector search, and multi-channel connectivity. It allows users to connect WhatsApp, Telegram, or Slack to create always-on bots with full context and background task delivery. The application supports multi-channel connectivity, deep research mode, local web browsing with Puppeteer, local knowledge and privacy features, visual and creative tools, automation and agents, developer experience enhancements, and more. Seline is actively developed with a focus on improving user experience and functionality.
spaCy
spaCy is an industrial-strength Natural Language Processing (NLP) library in Python and Cython. It incorporates the latest research and is designed for real-world applications. The library offers pretrained pipelines supporting 70+ languages, with advanced neural network models for tasks such as tagging, parsing, named entity recognition, and text classification. It also facilitates multi-task learning with pretrained transformers like BERT, along with a production-ready training system and streamlined model packaging, deployment, and workflow management. spaCy is commercial open-source software released under the MIT license.
ClaudeBar
ClaudeBar is a macOS menu bar application that monitors AI coding assistant usage quotas. It allows users to keep track of their usage of Claude, Codex, Gemini, GitHub Copilot, Antigravity, and Z.ai at a glance. The application offers multi-provider support, real-time quota tracking, multiple themes, visual status indicators, system notifications, auto-refresh feature, and keyboard shortcuts for quick access. Users can customize monitoring by toggling individual providers on/off and receive alerts when quota status changes. The tool requires macOS 15+, Swift 6.2+, and CLI tools installed for the providers to be monitored.
zeptoclaw
ZeptoClaw is an ultra-lightweight personal AI assistant that offers a compact Rust binary with 29 tools, 8 channels, 9 providers, and container isolation. It focuses on integrations, security, and size discipline without compromising on performance. With features like container isolation, prompt injection detection, secret leak scanner, policy engine, input validator, and more, ZeptoClaw ensures secure AI agent execution. It supports migration from OpenClaw, deployment on various platforms, and configuration of LLM providers. ZeptoClaw is designed for efficient AI assistance with minimal resource consumption and maximum security.
pocketpaw
PocketPaw is a lightweight and user-friendly tool designed for managing and organizing your digital assets. It provides a simple interface for users to easily categorize, tag, and search for files across different platforms. With PocketPaw, you can efficiently organize your photos, documents, and other files in a centralized location, making it easier to access and share them. Whether you are a student looking to organize your study materials, a professional managing project files, or a casual user wanting to declutter your digital space, PocketPaw is the perfect solution for all your file management needs.
stenoai
StenoAI is an AI-powered meeting intelligence tool that allows users to record, transcribe, summarize, and query meetings using local AI models. It prioritizes privacy by processing data entirely on the user's device. The tool offers multiple AI models optimized for different use cases, making it ideal for healthcare, legal, and finance professionals with confidential data needs. StenoAI also features a macOS desktop app with a user-friendly interface, making it convenient for users to access its functionalities. The project is open-source and not affiliated with any specific company, emphasizing its focus on meeting-notes productivity and community collaboration.
agentsys
AgentSys is a modular runtime and orchestration system for AI agents, with 14 plugins, 43 agents, and 30 skills that compose into structured pipelines for software development. Each agent has a single responsibility, a specific model assignment, and defined inputs/outputs. The system runs on Claude Code, OpenCode, and Codex CLI, and plugins are fetched automatically from their repos. AgentSys orchestrates agents to handle tasks like task selection, branch management, code review, artifact cleanup, CI, PR comments, and deployment.
OSA
OSA (Open-Source-Advisor) is a tool designed to improve the quality of scientific open source projects by automating the generation of README files, documentation, CI/CD scripts, and providing advice and recommendations for repositories. It supports various LLMs accessible via API, local servers, or osa_bot hosted on ITMO servers. OSA is currently under development with features like README file generation, documentation generation, automatic implementation of changes, LLM integration, and GitHub Action Workflow generation. It requires Python 3.10 or higher and tokens for GitHub/GitLab/Gitverse and LLM API key. Users can install OSA using PyPi or build from source, and run it using CLI commands or Docker containers.
agentsys
AgentSys is a modular runtime and orchestration system for AI agents, with 13 plugins, 42 agents, and 28 skills that compose into structured pipelines for software development. It handles task selection, branch management, code review, artifact cleanup, CI, PR comments, and deployment. The system runs on Claude Code, OpenCode, and Codex CLI, providing a functional software suite and runtime for AI agent orchestration.
vibeship-spark-intelligence
Spark Intelligence is a self-evolving AI companion that runs 100% on your machine as a local AI companion. It captures, distills, transforms, and delivers advisory context to help you act with better context. It is designed to convert experience into adaptive operational behavior, not just stored memory. The tool is beyond a learning loop, continuously learning and growing smarter through use. It provides a distillation pipeline, transformation layer, advisory delivery, EIDOS loop, domain chips, observability surfaces, and a CLI for easy interaction. The architecture involves event capture, queue, bridge worker, pipeline, quality gate, cognitive learner, and more. The Obsidian Observatory integration allows users to browse, search, and query insights, decisions, and quality verdicts in a human-readable vault.
aegra
Aegra is a self-hosted AI agent backend platform that provides LangGraph power without vendor lock-in. Built with FastAPI + PostgreSQL, it offers complete control over agent orchestration for teams looking to escape vendor lock-in, meet data sovereignty requirements, enable custom deployments, and optimize costs. Aegra is Agent Protocol compliant and perfect for teams seeking a free, self-hosted alternative to LangGraph Platform with zero lock-in, full control, and compatibility with existing LangGraph Client SDK.
postgresai
PostgresAI is an AI-native PostgreSQL observability tool designed for monitoring, health checks, and root cause analysis. It provides structured reports and metrics for AI consumption, tracks problems from detection to resolution, offers over 45 health checks including bloat, indexes, queries, settings, and security, and features Active Session History similar to Oracle ASH. PostgresAI is part of the Self-Driving Postgres initiative, aiming to make Postgres autonomous. It includes expert dashboards following the Four Golden Signals methodology and is battle-tested with companies like GitLab, Miro, Chewy, and more.
mcp-rubber-duck
MCP Rubber Duck is a Model Context Protocol server that acts as a bridge to query multiple LLMs, including OpenAI-compatible HTTP APIs and CLI coding agents. Users can explain their problems to various AI 'ducks' to get different perspectives. The tool offers features like universal OpenAI compatibility, CLI agent support, conversation management, multi-duck querying, consensus voting, LLM-as-Judge evaluation, structured debates, health monitoring, usage tracking, and more. It supports various HTTP providers like OpenAI, Google Gemini, Anthropic, Groq, Together AI, Perplexity, and CLI providers like Claude Code, Codex, Gemini CLI, Grok, Aider, and custom agents. Users can install the tool globally, configure it using environment variables, and access interactive UIs for comparing ducks, voting, debating, and usage statistics. The tool provides multiple tools for asking questions, chatting, clearing conversations, listing ducks, comparing responses, voting, judging, iterating, debating, and more. It also offers prompt templates for different analysis purposes and extensive documentation for setup, configuration, tools, prompts, CLI providers, MCP Bridge, guardrails, Docker deployment, troubleshooting, contributing, license, acknowledgments, changelog, registry & directory, and support.
EvoAgentX
EvoAgentX is an open-source framework for building, evaluating, and evolving LLM-based agents or agentic workflows in an automated, modular, and goal-driven manner. It enables developers and researchers to move beyond static prompt chaining or manual workflow orchestration by introducing a self-evolving agent ecosystem. The framework includes features such as agent workflow autoconstruction, built-in evaluation, self-evolution engine, plug-and-play compatibility, comprehensive built-in tools, memory module support, and human-in-the-loop interactions.
awesome-slash
Automate the entire development workflow beyond coding. awesome-slash provides production-ready skills, agents, and commands for managing tasks, branches, reviews, CI, and deployments. It automates the entire workflow, including task exploration, planning, implementation, review, and shipping. The tool includes 11 plugins, 40 agents, 26 skills, and 26k lines of lib code, with 3,357 tests and support for 3 platforms. It works with Claude Code, OpenCode, and Codex CLI, offering specialized capabilities through skills and agents.
For similar tasks
wunjo.wladradchenko.ru
Wunjo AI is a comprehensive tool that empowers users to explore the realm of speech synthesis, deepfake animations, video-to-video transformations, and more. Its user-friendly interface and privacy-first approach make it accessible to both beginners and professionals alike. With Wunjo AI, you can effortlessly convert text into human-like speech, clone voices from audio files, create multi-dialogues with distinct voice profiles, and perform real-time speech recognition. Additionally, you can animate faces using just one photo combined with audio, swap faces in videos, GIFs, and photos, and even remove unwanted objects or enhance the quality of your deepfakes using the AI Retouch Tool. Wunjo AI is an all-in-one solution for your voice and visual AI needs, offering endless possibilities for creativity and expression.
airunner
AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.
Wechat-AI-Assistant
Wechat AI Assistant is a project that enables multi-modal interaction with ChatGPT AI assistant within WeChat. It allows users to engage in conversations, role-playing, respond to voice messages, analyze images and videos, summarize articles and web links, and search the internet. The project utilizes the WeChatFerry library to control the Windows PC desktop WeChat client and leverages the OpenAI Assistant API for intelligent multi-modal message processing. Users can interact with ChatGPT AI in WeChat through text or voice, access various tools like bing_search, browse_link, image_to_text, text_to_image, text_to_speech, video_analysis, and more. The AI autonomously determines which code interpreter and external tools to use to complete tasks. Future developments include file uploads for AI to reference content, integration with other APIs, and login support for enterprise WeChat and WeChat official accounts.
Generative-AI-Pharmacist
Generative AI Pharmacist is a project showcasing the use of generative AI tools to create an animated avatar named Macy, who delivers medication counseling in a realistic and professional manner. The project utilizes tools like Midjourney for image generation, ChatGPT for text generation, ElevenLabs for text-to-speech conversion, and D-ID for creating a photorealistic talking avatar video. The demo video featuring Macy discussing commonly-prescribed medications demonstrates the potential of generative AI in healthcare communication.
AnyGPT
AnyGPT is a unified multimodal language model that utilizes discrete representations for processing various modalities like speech, text, images, and music. It aligns the modalities for intermodal conversions and text processing. AnyInstruct dataset is constructed for generative models. The model proposes a generative training scheme using Next Token Prediction task for training on a Large Language Model (LLM). It aims to compress vast multimodal data on the internet into a single model for emerging capabilities. The tool supports tasks like text-to-image, image captioning, ASR, TTS, text-to-music, and music captioning.
Pallaidium
Pallaidium is a generative AI movie studio integrated into the Blender video editor. It allows users to AI-generate video, image, and audio from text prompts or existing media files. The tool provides various features such as text to video, text to audio, text to speech, text to image, image to image, image to video, video to video, image to text, and more. It requires a Windows system with a CUDA-supported Nvidia card and at least 6 GB VRAM. Pallaidium offers batch processing capabilities, text to audio conversion using Bark, and various performance optimization tips. Users can install the tool by downloading the add-on and following the installation instructions provided. The tool comes with a set of restrictions on usage, prohibiting the generation of harmful, pornographic, violent, or false content.
ElevenLabs-DotNet
ElevenLabs-DotNet is a non-official Eleven Labs voice synthesis RESTful client that allows users to convert text to speech. The library targets .NET 8.0 and above, working across various platforms like console apps, winforms, wpf, and asp.net, and across Windows, Linux, and Mac. Users can authenticate using API keys directly, from a configuration file, or system environment variables. The tool provides functionalities for text to speech conversion, streaming text to speech, accessing voices, dubbing audio or video files, generating sound effects, managing history of synthesized audio clips, and accessing user information and subscription status.
omniai
OmniAI provides a unified Ruby API for integrating with multiple AI providers, streamlining AI development by offering a consistent interface for features such as chat, text-to-speech, speech-to-text, and embeddings. It ensures seamless interoperability across platforms and effortless switching between providers, making integrations more flexible and reliable.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

