
airunner
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
Stars: 1227

AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.
README:
π Report Bug |
β¨ Request Feature |
π‘οΈ Report Vulnerability |
π‘οΈ Wiki |
Show your support for this project by choosing one of the following options for donations.
- Crypto: 0x02030569e866e22C9991f55Db0445eeAd2d646c8
- Github Sponsors: https://github.com/sponsors/w4ffl35
- Patreon: https://www.patreon.com/c/w4ffl35
βοΈ Get notified when the packaged version releases
β¨ Key Features |
---|
π£οΈ Real-time conversations |
- Three speech engines: espeak, SpeechT5, OpenVoice - Auto language detection (OpenVoice) - Real-time voice-chat with LLMs |
π€ Customizable AI Agents |
- Custom agent names, moods, personalities - Retrieval-Augmented Generation (RAG) - Create AI personalities and moods |
π Enhanced Knowledge Retrieval |
- RAG for documents/websites - Use local data to enrich chat |
πΌοΈ Image Generation & Manipulation |
- Text-to-Image (Stable Diffusion 1.5, SDXL, Turbo) - Drawing tools & ControlNet - LoRA & Embeddings - Inpainting, outpainting, filters |
π Multi-lingual Capabilities |
- Partial multi-lingual TTS/STT/interface - English & Japanese GUI |
π Privacy and Security |
- Runs locally, no external API (default) - Customizable LLM guardrails & image safety - Disables HuggingFace telemetry - Restricts network access |
β‘ Performance & Utility |
- Fast generation (~2s on RTX 2080s) - Docker-based setup & GPU acceleration - Theming (Light/Dark/System) - NSFW toggles - Extension API - Python library & API support |
Language | TTS | LLM | STT | GUI |
---|---|---|---|---|
English | β | β | β | β |
Japanese | β | β | β | β |
Spanish | β | β | β | β |
French | β | β | β | β |
Chinese | β | β | β | β |
Korean | β | β | β | β |
AI Runner is a powerful tool designed for local, private use. However, its capabilities mean that users must be aware of their responsibilities under emerging AI regulations. This section provides information regarding the Colorado AI Act.
As the developer of AI Runner, we have a duty of care to inform our users about how this law may apply to them.
- Your Role as a User: If you use AI Runner to make, or as a substantial factor in making, an important decision that has a legal or similarly significant effect on someone's life, you may be considered a "deployer" of a "high-risk AI system" under Colorado law.
- What is a "High-Risk" Use Case? Examples of high-risk decisions include using AI to screen job applicants, evaluate eligibility for loans, housing, insurance, or other essential services.
- User Responsibility: Given AI Runner's customizable nature (e.g., using RAG with personal or business documents), it is possible to configure it for such high-risk purposes. If you do so, you are responsible for complying with the obligations of a "deployer," which include performing impact assessments and preventing algorithmic discrimination.
- Our Commitment: We are committed to developing AI Runner responsibly. The built-in privacy features, local-first design, and configurable guardrails are intended to provide you with the tools to use AI safely. We strongly encourage you to understand the capabilities and limitations of the AI models you choose to use and to consider the ethical implications of your specific application.
For more information, we recommend reviewing the text of the Colorado AI Act.
Specification | Minimum | Recommended |
---|---|---|
OS | Ubuntu 22.04, Windows 10 | Ubuntu 22.04 (Wayland) |
CPU | Ryzen 2700K or Intel Core i7-8700K | Ryzen 5800X or Intel Core i7-11700K |
Memory | 16 GB RAM | 32 GB RAM |
GPU | NVIDIA RTX 3060 or better | NVIDIA RTX 4090 or better |
Network | Broadband (used to download models) | Broadband (used to download models) |
Storage | 22 GB (with models), 6 GB (without models) | 100 GB or higher |
-
Install system requirements
sudo apt update && sudo apt upgrade -y sudo apt install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl git nvidia-cuda-toolkit pipewire libportaudio2 libxcb-cursor0 gnupg gpg-agent pinentry-curses espeak xclip cmake qt6-qpa-plugins qt6-wayland qt6-gtk-platformtheme mecab libmecab-dev mecab-ipadic-utf8 libxslt-dev mkcert sudo apt install espeak sudo apt install espeak-ng-espeak
-
Create
airunner
directorysudo mkdir ~/.local/share/airunner sudo chown $USER:$USER ~/.local/share/airunner
-
Install AI Runner - Python 3.13+ required
pyenv
andvenv
are recommended (see wiki for more info)pip install "typing-extensions==4.13.2" pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 pip install airunner[all_dev]
-
Run AI Runner
airunner
For more options, including Docker, see the Installation Wiki.
-
Run AI Runner:
airunner
-
Run the downloader:
airunner-setup
-
Build templates:
airunner-build-ui
These are the sizes of the optional models that power AI Runner.
AI Runner uses the following stack
|
By default, AI Runner installs essential TTS/STT and minimal LLM components, but AI art models must be supplied by the user. Organize them under your local AI Runner data directory:
|
- The chatbot's mood and conversation summary system is always enabled by default. The bot's mood and emoji are shown with each bot message.
- When the LLM is updating the bot's mood or summarizing the conversation, a loading spinner and status message are shown in the chat prompt widget. The indicator disappears as soon as a new message arrives.
- This system is automatic and requires no user configuration.
- For more details, see the LLM Chat Prompt Widget README.
- The mood and summary engines are now fully integrated into the agent runtime. When the agent updates mood or summarizes the conversation, it emits a signal to the UI with a customizable loading message. The chat prompt widget displays this message as a loading indicator.
- See
src/airunner/handlers/llm/agent/agents/base.py
for integration details andsrc/airunner/api/chatbot_services.py
for the API function.
AI Runner includes an Aggregated Search Tool for querying multiple online services from a unified interface. This tool is available as a NodeGraphQt node, an LLM agent tool, and as a Python API.
Supported Search Services:
- DuckDuckGo (no API key required)
- Wikipedia (no API key required)
- arXiv (no API key required)
- Google Custom Search (requires
GOOGLE_API_KEY
andGOOGLE_CSE_ID
) - Bing Web Search (requires
BING_SUBSCRIPTION_KEY
) - NewsAPI (requires
NEWSAPI_KEY
) - StackExchange (optional
STACKEXCHANGE_KEY
for higher quota) - GitHub Repositories (optional
GITHUB_TOKEN
for higher rate limits) - OpenLibrary (no API key required)
API Key Setup:
- Set the required API keys as environment variables before running AI Runner. Only services with valid keys will be queried.
- Example:
export GOOGLE_API_KEY=your_google_api_key export GOOGLE_CSE_ID=your_google_cse_id export BING_SUBSCRIPTION_KEY=your_bing_key export NEWSAPI_KEY=your_newsapi_key export STACKEXCHANGE_KEY=your_stackexchange_key export GITHUB_TOKEN=your_github_token
Usage:
- Use the Aggregated Search node in NodeGraphQt for visual workflows.
- Call the tool from LLM agents or Python code:
from airunner.components.tools import AggregatedSearchTool results = await AggregatedSearchTool.aggregated_search("python", category="web")
- See
src/airunner/tools/README.md
for more details.
Note:
- DuckDuckGo, Wikipedia, arXiv, and OpenLibrary do not require API keys and can be used out-of-the-box.
- For best results and full service coverage, configure all relevant API keys.
AI Runner's local server enforces HTTPS-only operation for all local resources. HTTP is never used or allowed for local static assets or API endpoints. At startup, the server logs explicit details about HTTPS mode and the certificate/key in use. Security headers are set and only GET/HEAD methods are allowed for further hardening.
-
Automatic Certificate Generation (Recommended):
- By default, AI Runner will auto-generate a self-signed certificate in
~/.local/share/airunner/certs/
if one does not exist. No manual steps are required for most users. - If you want to provide your own certificate, place
cert.pem
andkey.pem
in thecerts
directory under your AI Runner base path.
- By default, AI Runner will auto-generate a self-signed certificate in
-
Manual Certificate Generation (Optional):
- You can manually generate a self-signed certificate with:
airunner-generate-cert
- This will create
cert.pem
andkey.pem
in your current directory. Move them to your AI Runner certs directory if you want to use them.
- You can manually generate a self-signed certificate with:
-
Configure AI Runner to Use SSL:
- The app will automatically use the certificates in the certs directory. If you want to override, set the environment variables:
export AIRUNNER_SSL_CERT=~/path/to/cert.pem export AIRUNNER_SSL_KEY=~/path/to/key.pem airunner
- The server will use HTTPS if both files are provided.
- The app will automatically use the certificates in the certs directory. If you want to override, set the environment variables:
-
Access the App via
https://localhost:<port>
- The default port is 5005 (configurable in
src/airunner/settings.py
). - Your browser may warn about the self-signed certificate; you can safely bypass this for local development.
- The default port is 5005 (configurable in
- For production or remote access, use a certificate from a trusted CA.
- Never share your private key (
key.pem
). - The server only binds to
127.0.0.1
by default for safety. - For additional hardening, see the Security guide and the code comments in
local_http_server.py
.
You can generate a self-signed SSL certificate for local HTTPS with a single command:
airunner-generate-cert
This will create cert.pem
and key.pem
in your current directory. Use these files with the local HTTP server as described above.
See the SSL/TLS section for full details.
- For a browser-trusted local HTTPS experience (no warnings), install mkcert:
# On Ubuntu/Debian: sudo apt install libnss3-tools brew install mkcert # (on macOS, or use your package manager) mkcert -install
- If
mkcert
is not installed, AI Runner will fall back to OpenSSL self-signed certificates, which will show browser warnings. - See the SSL/TLS section for details.
AI Runner provides several CLI commands for development, testing, and maintenance. Below is a summary of all available commands:
Command | Description |
---|---|
airunner |
Launch the AI Runner application GUI. |
airunner-setup |
Download and set up required models and data. |
airunner-build-ui |
Regenerate Python UI files from .ui templates. Run after editing any .ui file. |
airunner-compile-translations |
Compile translation files for internationalization. |
airunner-tests |
Run the full test suite using pytest. |
airunner-test-coverage-report |
Generate a test coverage report. |
airunner-docker |
Run Docker-related build and management commands for AI Runner. |
airunner-generate-migration |
Generate a new Alembic database migration. |
airunner-generate-cert |
Generate a self-signed SSL certificate for local HTTPS. |
airunner-mypy <filename> |
Run mypy type checking on a file with project-recommended flags. |
Usage Examples:
# Launch the app
airunner
# Download models and set up data
airunner-setup
# Build UI Python files from .ui templates
airunner-build-ui
# Compile translation files
airunner-compile-translations
# Run all tests
airunner-tests
# Generate a test coverage report
airunner-test-coverage-report
# Run Docker build or management tasks
airunner-docker
# Generate a new Alembic migration
airunner-generate-migration
# Generate a self-signed SSL certificate
airunner-generate-cert
# Run mypy type checking on a file
airunner-mypy src/airunner/components/document_editor/gui/widgets/document_editor_widget.py
For more details on each command, see the Wiki or run the command with --help
if supported.
AI Runner supports a set of powerful chat slash commands, known as Slash Tools, that let you quickly trigger special actions, tools, or workflows directly from the chat prompt. These commands start with a /
and can be used in any chat conversation.
- Type
/
in the chat prompt to see available commands (autocomplete is supported in the UI). - Each slash command maps to a specific tool, agent action, or workflow.
- The set of available commands is extensible and may include custom or extension-provided tools.
Slash | Command | Action Type | Description |
---|---|---|---|
/a |
Image | GENERATE_IMAGE | Generate an image from a prompt |
/c |
Code | CODE | Run or generate code (if supported) |
/s |
Search | SEARCH | Search the web or knowledge base |
/w |
Workflow | WORKFLOW | Run a custom workflow (if supported) |
Note:
- Some slash tools (like
/a
for image) return an immediate confirmation message (e.g., "Ok, I've navigated to ...", "Ok, generating your image..."). - Others (like
/s
for search or/w
for workflow) do not return a direct message, but instead show a loading indicator until the result is ready. - The set of available slash commands is defined in
SLASH_COMMANDS
insrc/airunner/settings.py
and may be extended in the future.
For a full list of supported slash commands, type /help
in the chat prompt or see the copilot-instructions.md.
We welcome pull requests for new features, bug fixes, or documentation improvements. You can also build and share extensions to expand AI Runnerβs functionality. For details, see the Extensions Wiki.
Take a look at the Contributing document and the Development wiki page for detailed instructions.
AI Runner uses pytest
for all automated testing. Test coverage is a priority, especially for utility modules.
-
Headless-safe tests:
- Located in
src/airunner/utils/tests/
- Can be run in any environment (including CI, headless servers, and developer machines)
- Run with:
pytest src/airunner/utils/tests/
- Located in
-
Display-required (Qt/Xvfb) tests:
- Located in
src/airunner/utils/tests/xvfb_required/
- Require a real Qt display environment (cannot be run headlessly or with
pytest-qt
) - Typical for low-level Qt worker/signal/slot logic
- Run with:
xvfb-run -a pytest src/airunner/utils/tests/xvfb_required/ # Or for a single file: xvfb-run -a pytest src/airunner/utils/tests/xvfb_required/test_background_worker.py
- See the README in xvfb_required/ for details.
- Located in
- By default, only headless-safe tests are run in CI.
- Display-required tests are intended for manual or special-case runs (e.g., when working on Qt threading or background worker code).
- (Optional) You may automate this split in CI by adding a separate job/step for xvfb tests.
- All new utility code must be accompanied by tests.
- Use
pytest
,pytest-qt
(for GUI), andunittest.mock
for mocking dependencies. - For more details on writing and organizing tests, see the project coding guidelines and the
src/airunner/utils/tests/
folder.
- Follow the copilot-instructions.md for all development, testing, and contribution guidelines.
- Always use the
airunner
command in the terminal to run the application. - Always run tests in the terminal (not in the workspace test runner).
- Use
pytest
andpytest-cov
for running tests and checking coverage. - UI changes must be made in
.ui
files and rebuilt withairunner-build-ui
.
- See the Wiki for architecture, usage, and advanced topics.
- API Service Layer
- Main Window Model Load Balancer
- Facehugger Shield Suite
- NodeGraphQt Vendor Module
- Xvfb-Required Tests
- ORM Models
- Maps
For additional details, see the Wiki.
If you want to use OpenStreet maps completely offline you can run your own local Nominatim instance.
NOMINATIM_PATH=/some/path
sudo mkdir -p $NOMINATIM_PATH/nominatim_data
sudo mkdir -p $NOMINATIM_PATH/nominatim_flatnode
docker run -it \
-e PBF_URL=https://download.geofabrik.de/north-america/us-latest.osm.pbf \
-e REPLICATION_URL=https://download.geofabrik.de/north-america/us-updates/ \
-p 8080:8080 \
-v nominatim-data:/var/lib/postgresql/data \
--shm-size=2g \
--name nominatim \
mediagis/nominatim:5.1
-
Start existing container:
docker start nominatim
-
With logs:
docker start nominatim && docker logs -f nominatim
-
Stop existing container:
docker stop nominatim
The server will take hours to setup if you are using the full US map. You can use a smaller region if you want to speed up the process.
After the server is running, you can access it at http://localhost:8080/
. Be sure to set the AIRUNNER_NOMINATIM_URL
environment variable to point to your local Nominatim instance:
export AIRUNNER_NOMINATIM_URL=http://localhost:8080/
If you find this project useful, please consider sponsoring its development. Your support helps cover the costs of infrastructure, development, and maintenance.
You can sponsor the project on GitHub Sponsors.
Thank you for your support!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for airunner
Similar Open Source Tools

airunner
AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.

Kohaku-NAI
Kohaku-NAI is a simple Novel-AI client with utilities like a generation server, saving images automatically, account pool, and an auth system. It also includes a standalone client, a DC bot based on the generation server, and a stable-diffusion-webui extension. Users can use it to generate images with NAI API within sd-webui, as a standalone client, gen server, or DC bot. The project aims to add features like QoS system, better client, random prompts, and fetch account info in the future.

dexto
Dexto is a lightweight runtime for creating and running AI agents that turn natural language into real-world actions. It serves as the missing intelligence layer for building AI applications, standalone chatbots, or as the reasoning engine inside larger products. Dexto features a powerful CLI and Web UI for running AI agents, supports multiple interfaces, allows hot-swapping of LLMs from various providers, connects to remote tool servers via the Model Context Protocol, is config-driven with version-controlled YAML, offers production-ready core features, extensibility for custom services, and enables multi-agent collaboration via MCP and A2A.

recognizer
Recognizer is a Python library for speech recognition. It provides a simple interface to transcribe speech from audio files or live audio input. The library supports multiple speech recognition engines, including Google Speech Recognition, Sphinx, and Wit.ai. Recognizer is easy to use and can be integrated into various applications to enable voice commands, transcription, and speech-to-text functionality.

nndeploy
nndeploy is a tool that allows you to quickly build your visual AI workflow without the need for frontend technology. It provides ready-to-use algorithm nodes for non-AI programmers, including large language models, Stable Diffusion, object detection, image segmentation, etc. The workflow can be exported as a JSON configuration file, supporting Python/C++ API for direct loading and running, deployment on cloud servers, desktops, mobile devices, edge devices, and more. The framework includes mainstream high-performance inference engines and deep optimization strategies to help you transform your workflow into enterprise-level production applications.

Rodel.Agent
Rodel Agent is a Windows desktop application that integrates chat, text-to-image, text-to-speech, and machine translation services, providing users with a comprehensive desktop AI experience. The application supports mainstream AI services and aims to enhance user interaction through various AI functionalities.

AlphaAvatar
AlphaAvatar is a powerful tool for creating customizable avatars with AI-generated faces. It provides a user-friendly interface to design unique characters for various purposes such as gaming, virtual reality, social media, and more. With advanced AI algorithms, users can easily generate realistic and diverse avatars to enhance their projects and engage with their audience.

jadx-mcp-server
JADX-MCP-SERVER is a standalone Python server that interacts with JADX-AI-MCP Plugin to analyze Android APKs using LLMs like Claude. It enables live communication with decompiled Android app context, uncovering vulnerabilities, parsing manifests, and facilitating reverse engineering effortlessly. The tool combines JADX-AI-MCP and JADX MCP SERVER to provide real-time reverse engineering support with LLMs, offering features like quick analysis, vulnerability detection, AI code modification, static analysis, and reverse engineering helpers. It supports various MCP tools for fetching class information, text, methods, fields, smali code, AndroidManifest.xml content, strings.xml file, resource files, and more. Tested on Claude Desktop, it aims to support other LLMs in the future, enhancing Android reverse engineering and APK modification tools connectivity for easier reverse engineering purely from vibes.

GraphLLM
GraphLLM is a graph-based framework designed to process data using LLMs. It offers a set of tools including a web scraper, PDF parser, YouTube subtitles downloader, Python sandbox, and TTS engine. The framework provides a GUI for building and debugging graphs with advanced features like loops, conditionals, parallel execution, streaming of results, hierarchical graphs, external tool integration, and dynamic scheduling. GraphLLM is a low-level framework that gives users full control over the raw prompt and output of models, with a steeper learning curve. It is tested with llama70b and qwen 32b, under heavy development with breaking changes expected.

baibot
Baibot is a versatile chatbot framework designed to simplify the process of creating and deploying chatbots. It provides a user-friendly interface for building custom chatbots with various functionalities such as natural language processing, conversation flow management, and integration with external APIs. Baibot is highly customizable and can be easily extended to suit different use cases and industries. With Baibot, developers can quickly create intelligent chatbots that can interact with users in a seamless and engaging manner, enhancing user experience and automating customer support processes.

qapyq
qapyq is an image viewer and AI-assisted editing tool designed to help curate datasets for generative AI models. It offers features such as image viewing, editing, captioning, batch processing, and AI assistance. Users can perform tasks like cropping, scaling, editing masks, tagging, and applying sorting and filtering rules. The tool supports state-of-the-art captioning and masking models, with options for model settings, GPU acceleration, and quantization. qapyq aims to streamline the process of preparing images for training AI models by providing a user-friendly interface and advanced functionalities.

verl-tool
The verl-tool is a versatile command-line utility designed to streamline various tasks related to version control and code management. It provides a simple yet powerful interface for managing branches, merging changes, resolving conflicts, and more. With verl-tool, users can easily track changes, collaborate with team members, and ensure code quality throughout the development process. Whether you are a beginner or an experienced developer, verl-tool offers a seamless experience for version control operations.

Elite-Dangerous-AI-Integration
Elite-Dangerous-AI-Integration aims to provide a seamless and efficient experience for commanders by integrating Elite:Dangerous with various services for Speech-to-Text, Text-to-Speech, and Large Language Models. The AI reacts to game events, given commands, and can perform actions like taking screenshots or fetching information from APIs. It is designed for all commanders, enhancing roleplaying, replacing third-party websites, and assisting with tutorials.

FastFlowLM
FastFlowLM is a Python library for efficient and scalable language model inference. It provides a high-performance implementation of language model scoring using n-gram language models. The library is designed to handle large-scale text data and can be easily integrated into natural language processing pipelines for tasks such as text generation, speech recognition, and machine translation. FastFlowLM is optimized for speed and memory efficiency, making it suitable for both research and production environments.

Multi-Agent-Custom-Automation-Engine-Solution-Accelerator
The Multi-Agent -Custom Automation Engine Solution Accelerator is an AI-driven orchestration system that manages a group of AI agents to accomplish tasks based on user input. It uses a FastAPI backend to handle HTTP requests, processes them through various specialized agents, and stores stateful information using Azure Cosmos DB. The system allows users to focus on what matters by coordinating activities across an organization, enabling GenAI to scale, and is applicable to most industries. It is intended for developing and deploying custom AI solutions for specific customers, providing a foundation to accelerate building out multi-agent systems.

CodeWebChat
Code Web Chat is a versatile, free, and open-source AI pair programming tool with a unique web-based workflow. Users can select files, type instructions, and initialize various chatbots like ChatGPT, Gemini, Claude, and more hands-free. The tool helps users save money with free tiers and subscription-based billing and save time with multi-file edits from a single prompt. It supports chatbot initialization through the Connector browser extension and offers API tools for code completions, editing context, intelligent updates, and commit messages. Users can handle AI responses, code completions, and version control through various commands. The tool is privacy-focused, operates locally, and supports any OpenAI-API compatible provider for its utilities.
For similar tasks

wunjo.wladradchenko.ru
Wunjo AI is a comprehensive tool that empowers users to explore the realm of speech synthesis, deepfake animations, video-to-video transformations, and more. Its user-friendly interface and privacy-first approach make it accessible to both beginners and professionals alike. With Wunjo AI, you can effortlessly convert text into human-like speech, clone voices from audio files, create multi-dialogues with distinct voice profiles, and perform real-time speech recognition. Additionally, you can animate faces using just one photo combined with audio, swap faces in videos, GIFs, and photos, and even remove unwanted objects or enhance the quality of your deepfakes using the AI Retouch Tool. Wunjo AI is an all-in-one solution for your voice and visual AI needs, offering endless possibilities for creativity and expression.

airunner
AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.

Wechat-AI-Assistant
Wechat AI Assistant is a project that enables multi-modal interaction with ChatGPT AI assistant within WeChat. It allows users to engage in conversations, role-playing, respond to voice messages, analyze images and videos, summarize articles and web links, and search the internet. The project utilizes the WeChatFerry library to control the Windows PC desktop WeChat client and leverages the OpenAI Assistant API for intelligent multi-modal message processing. Users can interact with ChatGPT AI in WeChat through text or voice, access various tools like bing_search, browse_link, image_to_text, text_to_image, text_to_speech, video_analysis, and more. The AI autonomously determines which code interpreter and external tools to use to complete tasks. Future developments include file uploads for AI to reference content, integration with other APIs, and login support for enterprise WeChat and WeChat official accounts.

Generative-AI-Pharmacist
Generative AI Pharmacist is a project showcasing the use of generative AI tools to create an animated avatar named Macy, who delivers medication counseling in a realistic and professional manner. The project utilizes tools like Midjourney for image generation, ChatGPT for text generation, ElevenLabs for text-to-speech conversion, and D-ID for creating a photorealistic talking avatar video. The demo video featuring Macy discussing commonly-prescribed medications demonstrates the potential of generative AI in healthcare communication.

AnyGPT
AnyGPT is a unified multimodal language model that utilizes discrete representations for processing various modalities like speech, text, images, and music. It aligns the modalities for intermodal conversions and text processing. AnyInstruct dataset is constructed for generative models. The model proposes a generative training scheme using Next Token Prediction task for training on a Large Language Model (LLM). It aims to compress vast multimodal data on the internet into a single model for emerging capabilities. The tool supports tasks like text-to-image, image captioning, ASR, TTS, text-to-music, and music captioning.

Pallaidium
Pallaidium is a generative AI movie studio integrated into the Blender video editor. It allows users to AI-generate video, image, and audio from text prompts or existing media files. The tool provides various features such as text to video, text to audio, text to speech, text to image, image to image, image to video, video to video, image to text, and more. It requires a Windows system with a CUDA-supported Nvidia card and at least 6 GB VRAM. Pallaidium offers batch processing capabilities, text to audio conversion using Bark, and various performance optimization tips. Users can install the tool by downloading the add-on and following the installation instructions provided. The tool comes with a set of restrictions on usage, prohibiting the generation of harmful, pornographic, violent, or false content.

ElevenLabs-DotNet
ElevenLabs-DotNet is a non-official Eleven Labs voice synthesis RESTful client that allows users to convert text to speech. The library targets .NET 8.0 and above, working across various platforms like console apps, winforms, wpf, and asp.net, and across Windows, Linux, and Mac. Users can authenticate using API keys directly, from a configuration file, or system environment variables. The tool provides functionalities for text to speech conversion, streaming text to speech, accessing voices, dubbing audio or video files, generating sound effects, managing history of synthesized audio clips, and accessing user information and subscription status.

omniai
OmniAI provides a unified Ruby API for integrating with multiple AI providers, streamlining AI development by offering a consistent interface for features such as chat, text-to-speech, speech-to-text, and embeddings. It ensures seamless interoperability across platforms and effortless switching between providers, making integrations more flexible and reliable.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.