airunner

Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows

Stars: 1241

Visit

AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.

README:


🐞 Report Bug	✨ Request Feature	🛡️ Report Vulnerability	🛡️ Wiki

Support AI Runner

Show your support for this project by choosing one of the following options for donations.

Crypto: 0x02030569e866e22C9991f55Db0445eeAd2d646c8
Github Sponsors: https://github.com/sponsors/w4ffl35
Patreon: https://www.patreon.com/c/w4ffl35

✉️ Get notified when the packaged version releases

✨ Key Features
🗣️ Real-time conversations
- Three speech engines: espeak, SpeechT5, OpenVoice - Auto language detection (OpenVoice) - Real-time voice-chat with LLMs
🤖 Customizable AI Agents
- Custom agent names, moods, personalities - Retrieval-Augmented Generation (RAG) - Create AI personalities and moods
📚 Enhanced Knowledge Retrieval
- RAG for documents/websites - Use local data to enrich chat
🖼️ Image Generation & Manipulation
- Text-to-Image (Stable Diffusion 1.5, SDXL, Turbo) - Drawing tools & ControlNet - LoRA & Embeddings - Inpainting, outpainting, filters
🌍 Multi-lingual Capabilities
- Partial multi-lingual TTS/STT/interface - English & Japanese GUI
🔒 Privacy and Security
- Runs locally, no external API (default) - Customizable LLM guardrails & image safety - Disables HuggingFace telemetry - Restricts network access
⚡ Performance & Utility
- Fast generation (~2s on RTX 2080s) - Docker-based setup & GPU acceleration - Theming (Light/Dark/System) - NSFW toggles - Extension API - Python library & API support

🌍 Language Support

Language	TTS	LLM	STT	GUI
English	✅	✅	✅	✅
Japanese	✅	✅	❌	✅
Spanish	✅	✅	❌	❌
French	✅	✅	❌	❌
Chinese	✅	✅	❌	❌
Korean	✅	✅	❌	❌

🫰 Request language support

⚖️ Regulatory Compliance & Disclosures

AI Runner is a powerful tool designed for local, private use. However, its capabilities mean that users must be aware of their responsibilities under emerging AI regulations. This section provides information regarding the Colorado AI Act.

Colorado AI Act (SB 24-205) Notice

As the developer of AI Runner, we have a duty of care to inform our users about how this law may apply to them.

Your Role as a User: If you use AI Runner to make, or as a substantial factor in making, an important decision that has a legal or similarly significant effect on someone's life, you may be considered a "deployer" of a "high-risk AI system" under Colorado law.
What is a "High-Risk" Use Case? Examples of high-risk decisions include using AI to screen job applicants, evaluate eligibility for loans, housing, insurance, or other essential services.
User Responsibility: Given AI Runner's customizable nature (e.g., using RAG with personal or business documents), it is possible to configure it for such high-risk purposes. If you do so, you are responsible for complying with the obligations of a "deployer," which include performing impact assessments and preventing algorithmic discrimination.
Our Commitment: We are committed to developing AI Runner responsibly. The built-in privacy features, local-first design, and configurable guardrails are intended to provide you with the tools to use AI safely. We strongly encourage you to understand the capabilities and limitations of the AI models you choose to use and to consider the ethical implications of your specific application.

For more information, we recommend reviewing the text of the Colorado AI Act.

💾 Installation Quick Start

⚙️ System Requirements

Specification	Minimum	Recommended
OS	Ubuntu 22.04, Windows 10	Ubuntu 22.04 (Wayland)
CPU	Ryzen 2700K or Intel Core i7-8700K	Ryzen 5800X or Intel Core i7-11700K
Memory	16 GB RAM	32 GB RAM
GPU	NVIDIA RTX 3060 or better	NVIDIA RTX 4090 or better
Network	Broadband (used to download models)	Broadband (used to download models)
Storage	22 GB (with models), 6 GB (without models)	100 GB or higher

🔧 Installation Steps

Install system requirements

sudo apt update && sudo apt upgrade -y
sudo apt install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl git nvidia-cuda-toolkit pipewire libportaudio2 libxcb-cursor0 gnupg gpg-agent pinentry-curses espeak xclip cmake qt6-qpa-plugins qt6-wayland qt6-gtk-platformtheme mecab libmecab-dev mecab-ipadic-utf8 libxslt-dev mkcert
sudo apt install espeak
sudo apt install espeak-ng-espeak

Create airunner directory

sudo mkdir ~/.local/share/airunner
sudo chown $USER:$USER ~/.local/share/airunner

Install AI Runner - Python 3.13+ required pyenv and venv are recommended (see wiki for more info)

pip install "typing-extensions==4.13.2"
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install airunner[all_dev]

Run AI Runner
```
airunner
```

For more options, including Docker, see the Installation Wiki.

Basic Usage

Run AI Runner: airunner
Run the downloader: airunner-setup
Build templates: airunner-build-ui

🤖 Models

These are the sizes of the optional models that power AI Runner.

Modality	Size
Text-to-Speech
OpenVoice (Voice)	4.0 GB
Speech T5 (Voice)	654.4 MB
Speech-to-Text
Whisper Tiny	155.4 MB
Text Generation
Ministral 8b (default)	4.0 GB
Whisper Tiny	155.4 MB
Ollama (various models)	1.5 GB - 20 GB
OpenRouter (various models)	1.5 GB - 20 GB
Huggingface (various models)	1.5 GB - 20 GB
Ministral instruct 8b (4bit)	5.8 GB
Image Generation
Controlnet (SD 1.5)	10.6 GB
Controlnet (SDXL)	320.2 MB
Safety Checker + Feature Extractor	3.2 GB
SD 1.5	1.6 MB
SDXL 1.0	6.45 MB

Stack

AI Runner uses the following stack

SQLite: For local data storage
Alembic: For database migrations
SQLAlchemy: For ORM
Pydantic: For data validation
http.server: Basic local server for static files
PySide6: For the GUI
A variety of other libraries for TTS, STT, LLMs, and image generation

✨ LLM Vendors

Default local model: Ministral 8b instruct 4bit
Ollama:: A variety of local models to choose from (requires Ollama CLI)
OpenRouter: Remove server-side LLMs (requires API key)
Huggingface: Coming soon

🎨 Art Models

By default, AI Runner installs essential TTS/STT and minimal LLM components, but AI art models must be supplied by the user.

Organize them under your local AI Runner data directory:

~/.local/share/airunner
├── art
│   └── models
│       ├── SD 1.5
│       │   ├── controlnet
│       │   ├── embeddings
│       │   ├── inpaint
│       │   ├── lora
│       │   └── txt2img
│       ├── SDXL 1.0
│       │   ├── controlnet
│       │   ├── embeddings
│       │   ├── inpaint
│       │   ├── lora
│       │   └── txt2img
│       └── SDXL Turbo
│           ├── controlnet
│           ├── embeddings
│           ├── inpaint
│           ├── lora
│           └── txt2img

Optional third-party services

OpenMeteo: Weather API

Chatbot Mood and Conversation Summary System

The chatbot's mood and conversation summary system is always enabled by default. The bot's mood and emoji are shown with each bot message.
When the LLM is updating the bot's mood or summarizing the conversation, a loading spinner and status message are shown in the chat prompt widget. The indicator disappears as soon as a new message arrives.
This system is automatic and requires no user configuration.
For more details, see the LLM Chat Prompt Widget README.
The mood and summary engines are now fully integrated into the agent runtime. When the agent updates mood or summarizes the conversation, it emits a signal to the UI with a customizable loading message. The chat prompt widget displays this message as a loading indicator.
See src/airunner/handlers/llm/agent/agents/base.py for integration details and src/airunner/api/chatbot_services.py for the API function.

🔍 Aggregated Search Tool

AI Runner includes an Aggregated Search Tool for querying multiple online services from a unified interface. This tool is available as a NodeGraphQt node, an LLM agent tool, and as a Python API.

Supported Search Services:

DuckDuckGo (no API key required)
Wikipedia (no API key required)
arXiv (no API key required)
Google Custom Search (requires GOOGLE_API_KEY and GOOGLE_CSE_ID)
Bing Web Search (requires BING_SUBSCRIPTION_KEY)
NewsAPI (requires NEWSAPI_KEY)
StackExchange (optional STACKEXCHANGE_KEY for higher quota)
GitHub Repositories (optional GITHUB_TOKEN for higher rate limits)
OpenLibrary (no API key required)

API Key Setup:

Set the required API keys as environment variables before running AI Runner. Only services with valid keys will be queried.

Example:

export GOOGLE_API_KEY=your_google_api_key
export GOOGLE_CSE_ID=your_google_cse_id
export BING_SUBSCRIPTION_KEY=your_bing_key
export NEWSAPI_KEY=your_newsapi_key
export STACKEXCHANGE_KEY=your_stackexchange_key
export GITHUB_TOKEN=your_github_token

Usage:

Use the Aggregated Search node in NodeGraphQt for visual workflows.

Call the tool from LLM agents or Python code:

from airunner.components.tools import AggregatedSearchTool
results = await AggregatedSearchTool.aggregated_search("python", category="web")

See src/airunner/tools/README.md for more details.

Note:

DuckDuckGo, Wikipedia, arXiv, and OpenLibrary do not require API keys and can be used out-of-the-box.
For best results and full service coverage, configure all relevant API keys.

🔒 Enabling HTTPS for the Local HTTP Server

AI Runner's local server enforces HTTPS-only operation for all local resources. HTTP is never used or allowed for local static assets or API endpoints. At startup, the server logs explicit details about HTTPS mode and the certificate/key in use. Security headers are set and only GET/HEAD methods are allowed for further hardening.

How to Enable SSL/TLS (HTTPS)

Automatic Certificate Generation (Recommended):
- By default, AI Runner will auto-generate a self-signed certificate in ~/.local/share/airunner/certs/ if one does not exist. No manual steps are required for most users.
- If you want to provide your own certificate, place cert.pem and key.pem in the certs directory under your AI Runner base path.
Manual Certificate Generation (Optional):
- You can manually generate a self-signed certificate with:
```
airunner-generate-cert
```
- This will create cert.pem and key.pem in your current directory. Move them to your AI Runner certs directory if you want to use them.
Configure AI Runner to Use SSL:
- The app will automatically use the certificates in the certs directory. If you want to override, set the environment variables:
```
export AIRUNNER_SSL_CERT=~/path/to/cert.pem
export AIRUNNER_SSL_KEY=~/path/to/key.pem
airunner
```
- The server will use HTTPS if both files are provided.
Access the App via https://localhost:<port>
- The default port is 5005 (configurable in src/airunner/settings.py).
- Your browser may warn about the self-signed certificate; you can safely bypass this for local development.

Security Notes

For production or remote access, use a certificate from a trusted CA.
Never share your private key (key.pem).
The server only binds to 127.0.0.1 by default for safety.
For additional hardening, see the Security guide and the code comments in local_http_server.py.

🔑 Generate a Self-Signed Certificate (airunner-generate-cert)

You can generate a self-signed SSL certificate for local HTTPS with a single command:

airunner-generate-cert

This will create cert.pem and key.pem in your current directory. Use these files with the local HTTP server as described above.

See the SSL/TLS section for full details.

Additional Requirements for Trusted Local HTTPS

For a browser-trusted local HTTPS experience (no warnings), install mkcert:

# On Ubuntu/Debian:
sudo apt install libnss3-tools
brew install mkcert   # (on macOS, or use your package manager)
mkcert -install

If mkcert is not installed, AI Runner will fall back to OpenSSL self-signed certificates, which will show browser warnings.
See the SSL/TLS section for details.

🛠️ Command Line Tools

AI Runner provides several CLI commands for development, testing, and maintenance. Below is a summary of all available commands:

Command	Description
`airunner`	Launch the AI Runner application GUI.
`airunner-setup`	Download and set up required models and data.
`airunner-build-ui`	Regenerate Python UI files from `.ui` templates. Run after editing any `.ui` file.
`airunner-compile-translations`	Compile translation files for internationalization.
`airunner-tests`	Run the full test suite using pytest.
`airunner-test-coverage-report`	Generate a test coverage report.
`airunner-docker`	Run Docker-related build and management commands for AI Runner.
`airunner-generate-migration`	Generate a new Alembic database migration.
`airunner-generate-cert`	Generate a self-signed SSL certificate for local HTTPS.
`airunner-mypy <filename>`	Run mypy type checking on a file with project-recommended flags.

Usage Examples:

# Launch the app
airunner

# Download models and set up data
airunner-setup

# Build UI Python files from .ui templates
airunner-build-ui

# Compile translation files
airunner-compile-translations

# Run all tests
airunner-tests

# Generate a test coverage report
airunner-test-coverage-report

# Run Docker build or management tasks
airunner-docker

# Generate a new Alembic migration
airunner-generate-migration

# Generate a self-signed SSL certificate
airunner-generate-cert

# Run mypy type checking on a file
airunner-mypy src/airunner/components/document_editor/gui/widgets/document_editor_widget.py

For more details on each command, see the Wiki or run the command with --help if supported.

🚀 Slash Tools (Chat Slash Commands)

AI Runner supports a set of powerful chat slash commands, known as Slash Tools, that let you quickly trigger special actions, tools, or workflows directly from the chat prompt. These commands start with a / and can be used in any chat conversation.

How to Use

Type / in the chat prompt to see available commands (autocomplete is supported in the UI).
Each slash command maps to a specific tool, agent action, or workflow.
The set of available commands is extensible and may include custom or extension-provided tools.

Current Slash Commands

Slash	Command	Action Type	Description
`/a`	Image	GENERATE_IMAGE	Generate an image from a prompt
`/c`	Code	CODE	Run or generate code (if supported)
`/s`	Search	SEARCH	Search the web or knowledge base
`/w`	Workflow	WORKFLOW	Run a custom workflow (if supported)

Note:

Some slash tools (like /a for image) return an immediate confirmation message (e.g., "Ok, I've navigated to ...", "Ok, generating your image...").
Others (like /s for search or /w for workflow) do not return a direct message, but instead show a loading indicator until the result is ready.
The set of available slash commands is defined in SLASH_COMMANDS in src/airunner/settings.py and may be extended in the future.

For a full list of supported slash commands, type /help in the chat prompt or see the copilot-instructions.md.

Contributing

We welcome pull requests for new features, bug fixes, or documentation improvements. You can also build and share extensions to expand AI Runner’s functionality. For details, see the Extensions Wiki.

Take a look at the Contributing document and the Development wiki page for detailed instructions.

🧪 Testing & Test Organization

AI Runner uses pytest for all automated testing. Test coverage is a priority, especially for utility modules.

Test Directory Structure

Headless-safe tests:
- Located in src/airunner/utils/tests/
- Can be run in any environment (including CI, headless servers, and developer machines)
- Run with:
```
pytest src/airunner/utils/tests/
```
Display-required (Qt/Xvfb) tests:
- Located in src/airunner/utils/tests/xvfb_required/
- Require a real Qt display environment (cannot be run headlessly or with pytest-qt)
- Typical for low-level Qt worker/signal/slot logic
- Run with:
```
xvfb-run -a pytest src/airunner/utils/tests/xvfb_required/
# Or for a single file:
xvfb-run -a pytest src/airunner/utils/tests/xvfb_required/test_background_worker.py
```
- See the README in xvfb_required/ for details.

CI/CD

By default, only headless-safe tests are run in CI.
Display-required tests are intended for manual or special-case runs (e.g., when working on Qt threading or background worker code).
(Optional) You may automate this split in CI by adding a separate job/step for xvfb tests.

General Testing Guidelines

All new utility code must be accompanied by tests.
Use pytest, pytest-qt (for GUI), and unittest.mock for mocking dependencies.
For more details on writing and organizing tests, see the project coding guidelines and the src/airunner/utils/tests/ folder.

Development & Testing

Follow the copilot-instructions.md for all development, testing, and contribution guidelines.
Always use the airunner command in the terminal to run the application.
Always run tests in the terminal (not in the workspace test runner).
Use pytest and pytest-cov for running tests and checking coverage.
UI changes must be made in .ui files and rebuilt with airunner-build-ui.

Documentation

See the Wiki for architecture, usage, and advanced topics.

Module Documentation

For additional details, see the Wiki.

Sponsorship

If you find this project useful, please consider sponsoring its development. Your support helps cover the costs of infrastructure, development, and maintenance.

You can sponsor the project on GitHub Sponsors.

Thank you for your support!

Past Sponsors

Open Core Ventures Catalyst Program

For Tasks:

Click tags to check more tools for each tasks

generate images convert text to speech extract text from images run chatbot conversations modify images

For Jobs:

ai researcher data scientist software engineer machine learning engineer creative designer

Alternative AI tools for airunner

Similar Open Source Tools

airunner

github

: 1.2k

coreply

Coreply is an open-source Android app that provides texting suggestions while typing, enhancing the typing experience with intelligent, context-aware suggestions. It supports various texting apps and offers real-time AI suggestions, customizable LLM settings, and ensures no data collection. Users can install the app, configure it with an API key, and start receiving suggestions while typing in messaging apps. The tool supports different AI models from providers like OpenAI, Google AI Studio, Openrouter, Groq, and Codestral for chat completion and fill-in-the-middle tasks.

github

: 95

Starmoon

Starmoon is an affordable, compact AI-enabled device that can understand and respond to your emotions with empathy. It offers supportive conversations and personalized learning assistance. The device is cost-effective, voice-enabled, open-source, compact, and aims to reduce screen time. Users can assemble the device themselves using off-the-shelf components and deploy it locally for data privacy. Starmoon integrates various APIs for AI language models, speech-to-text, text-to-speech, and emotion intelligence. The hardware setup involves components like ESP32S3, microphone, amplifier, speaker, LED light, and button, along with software setup instructions for developers. The project also includes a web app, backend API, and background task dashboard for monitoring and management.

github

: 457

EvoAgentX

EvoAgentX is an open-source framework for building, evaluating, and evolving LLM-based agents or agentic workflows in an automated, modular, and goal-driven manner. It enables developers and researchers to move beyond static prompt chaining or manual workflow orchestration by introducing a self-evolving agent ecosystem. The framework includes features such as agent workflow autoconstruction, built-in evaluation, self-evolution engine, plug-and-play compatibility, comprehensive built-in tools, memory module support, and human-in-the-loop interactions.

github

: 1.6k

openlit

OpenLIT is an OpenTelemetry-native GenAI and LLM Application Observability tool. It's designed to make the integration process of observability into GenAI projects as easy as pie – literally, with just **a single line of code**. Whether you're working with popular LLM Libraries such as OpenAI and HuggingFace or leveraging vector databases like ChromaDB, OpenLIT ensures your applications are monitored seamlessly, providing critical insights to improve performance and reliability.

github

: 1.9k

$clearml-fractional-gpu Screenshot$

clearml-fractional-gpu

ClearML Fractional GPU is a tool designed to optimize GPU resource utilization by allowing multiple containers to run on the same GPU with driver-level memory limitation and compute time-slicing. It supports CUDA 11.x & CUDA 12.x, preventing greedy processes from grabbing the entire GPU memory. The tool offers options like Dynamic GPU Slicing, Container-based Memory Limits, and Kubernetes-based Static MIG Slicing to enhance hardware utilization and workload performance for AI development.

github

: 56

pipecat

Pipecat is an open-source framework designed for building generative AI voice bots and multimodal assistants. It provides code building blocks for interacting with AI services, creating low-latency data pipelines, and transporting audio, video, and events over the Internet. Pipecat supports various AI services like speech-to-text, text-to-speech, image generation, and vision models. Users can implement new services and contribute to the framework. Pipecat aims to simplify the development of applications like personal coaches, meeting assistants, customer support bots, and more by providing a complete framework for integrating AI services.

github

: 8.2k

OSA

OSA (Open-Source-Advisor) is a tool designed to improve the quality of scientific open source projects by automating the generation of README files, documentation, CI/CD scripts, and providing advice and recommendations for repositories. It supports various LLMs accessible via API, local servers, or osa_bot hosted on ITMO servers. OSA is currently under development with features like README file generation, documentation generation, automatic implementation of changes, LLM integration, and GitHub Action Workflow generation. It requires Python 3.10 or higher and tokens for GitHub/GitLab/Gitverse and LLM API key. Users can install OSA using PyPi or build from source, and run it using CLI commands or Docker containers.

github

: 94

MooER

MooER (摩耳) is an LLM-based speech recognition and translation model developed by Moore Threads. It allows users to transcribe speech into text (ASR) and translate speech into other languages (AST) in an end-to-end manner. The model was trained using 5K hours of data and is now also available with an 80K hours version. MooER is the first LLM-based speech model trained and inferred using domestic GPUs. The repository includes pretrained models, inference code, and a Gradio demo for a better user experience.

github

: 124

WeKnora

WeKnora is a document understanding and semantic retrieval framework based on large language models (LLM), designed specifically for scenarios with complex structures and heterogeneous content. The framework adopts a modular architecture, integrating multimodal preprocessing, semantic vector indexing, intelligent recall, and large model generation reasoning to build an efficient and controllable document question-answering process. The core retrieval process is based on the RAG (Retrieval-Augmented Generation) mechanism, combining context-relevant segments with language models to achieve higher-quality semantic answers. It supports various document formats, intelligent inference, flexible extension, efficient retrieval, ease of use, and security and control. Suitable for enterprise knowledge management, scientific literature analysis, product technical support, legal compliance review, and medical knowledge assistance.

github

: 5.8k

terminator

Terminator is an AI-powered desktop automation tool that is open source, MIT-licensed, and cross-platform. It works across all apps and browsers, inspired by GitHub Actions & Playwright. It is 100x faster than generic AI agents, with over 95% success rate and no vendor lock-in. Users can create automations that work across any desktop app or browser, achieve high success rates without costly consultant armies, and pre-train workflows as deterministic code.

github

: 935

superduperdb

SuperDuperDB is a Python framework for integrating AI models, APIs, and vector search engines directly with your existing databases, including hosting of your own models, streaming inference and scalable model training/fine-tuning. Build, deploy and manage any AI application without the need for complex pipelines, infrastructure as well as specialized vector databases, and moving our data there, by integrating AI at your data's source: - Generative AI, LLMs, RAG, vector search - Standard machine learning use-cases (classification, segmentation, regression, forecasting recommendation etc.) - Custom AI use-cases involving specialized models - Even the most complex applications/workflows in which different models work together SuperDuperDB is **not** a database. Think `db = superduper(db)`: SuperDuperDB transforms your databases into an intelligent platform that allows you to leverage the full AI and Python ecosystem. A single development and deployment environment for all your AI applications in one place, fully scalable and easy to manage.

github

: 4.5k

Automodel

Automodel is a Python library for automating the process of building and evaluating machine learning models. It provides a set of tools and utilities to streamline the model development workflow, from data preprocessing to model selection and evaluation. With Automodel, users can easily experiment with different algorithms, hyperparameters, and feature engineering techniques to find the best model for their dataset. The library is designed to be user-friendly and customizable, allowing users to define their own pipelines and workflows. Automodel is suitable for data scientists, machine learning engineers, and anyone looking to quickly build and test machine learning models without the need for manual intervention.

github

: 66

spandrel

Spandrel is a library for loading and running pre-trained PyTorch models. It automatically detects the model architecture and hyperparameters from model files, and provides a unified interface for running models.

github

: 183

agentscope

AgentScope is a multi-agent platform designed to empower developers to build multi-agent applications with large-scale models. It features three high-level capabilities: Easy-to-Use, High Robustness, and Actor-Based Distribution. AgentScope provides a list of `ModelWrapper` to support both local model services and third-party model APIs, including OpenAI API, DashScope API, Gemini API, and ollama. It also enables developers to rapidly deploy local model services using libraries such as ollama (CPU inference), Flask + Transformers, Flask + ModelScope, FastChat, and vllm. AgentScope supports various services, including Web Search, Data Query, Retrieval, Code Execution, File Operation, and Text Processing. Example applications include Conversation, Game, and Distribution. AgentScope is released under Apache License 2.0 and welcomes contributions.

github

: 6.7k

nexa-sdk

Nexa SDK is a comprehensive toolkit supporting ONNX and GGML models for text generation, image generation, vision-language models (VLM), and text-to-speech (TTS) capabilities. It offers an OpenAI-compatible API server with JSON schema mode and streaming support, along with a user-friendly Streamlit UI. Users can run Nexa SDK on any device with Python environment, with GPU acceleration supported. The toolkit provides model support, conversion engine, inference engine for various tasks, and differentiating features from other tools.

github

: 5.0k

For similar tasks

wunjo.wladradchenko.ru

Wunjo AI is a comprehensive tool that empowers users to explore the realm of speech synthesis, deepfake animations, video-to-video transformations, and more. Its user-friendly interface and privacy-first approach make it accessible to both beginners and professionals alike. With Wunjo AI, you can effortlessly convert text into human-like speech, clone voices from audio files, create multi-dialogues with distinct voice profiles, and perform real-time speech recognition. Additionally, you can animate faces using just one photo combined with audio, swap faces in videos, GIFs, and photos, and even remove unwanted objects or enhance the quality of your deepfakes using the AI Retouch Tool. Wunjo AI is an all-in-one solution for your voice and visual AI needs, offering endless possibilities for creativity and expression.

github

: 820

airunner

github

: 1.2k

Wechat-AI-Assistant

Wechat AI Assistant is a project that enables multi-modal interaction with ChatGPT AI assistant within WeChat. It allows users to engage in conversations, role-playing, respond to voice messages, analyze images and videos, summarize articles and web links, and search the internet. The project utilizes the WeChatFerry library to control the Windows PC desktop WeChat client and leverages the OpenAI Assistant API for intelligent multi-modal message processing. Users can interact with ChatGPT AI in WeChat through text or voice, access various tools like bing_search, browse_link, image_to_text, text_to_image, text_to_speech, video_analysis, and more. The AI autonomously determines which code interpreter and external tools to use to complete tasks. Future developments include file uploads for AI to reference content, integration with other APIs, and login support for enterprise WeChat and WeChat official accounts.

github

: 106

Generative-AI-Pharmacist

Generative AI Pharmacist is a project showcasing the use of generative AI tools to create an animated avatar named Macy, who delivers medication counseling in a realistic and professional manner. The project utilizes tools like Midjourney for image generation, ChatGPT for text generation, ElevenLabs for text-to-speech conversion, and D-ID for creating a photorealistic talking avatar video. The demo video featuring Macy discussing commonly-prescribed medications demonstrates the potential of generative AI in healthcare communication.

github

: 76

AnyGPT

AnyGPT is a unified multimodal language model that utilizes discrete representations for processing various modalities like speech, text, images, and music. It aligns the modalities for intermodal conversions and text processing. AnyInstruct dataset is constructed for generative models. The model proposes a generative training scheme using Next Token Prediction task for training on a Large Language Model (LLM). It aims to compress vast multimodal data on the internet into a single model for emerging capabilities. The tool supports tasks like text-to-image, image captioning, ASR, TTS, text-to-music, and music captioning.

github

: 730

Pallaidium

Pallaidium is a generative AI movie studio integrated into the Blender video editor. It allows users to AI-generate video, image, and audio from text prompts or existing media files. The tool provides various features such as text to video, text to audio, text to speech, text to image, image to image, image to video, video to video, image to text, and more. It requires a Windows system with a CUDA-supported Nvidia card and at least 6 GB VRAM. Pallaidium offers batch processing capabilities, text to audio conversion using Bark, and various performance optimization tips. Users can install the tool by downloading the add-on and following the installation instructions provided. The tool comes with a set of restrictions on usage, prohibiting the generation of harmful, pornographic, violent, or false content.

github

: 1.1k

ElevenLabs-DotNet

ElevenLabs-DotNet is a non-official Eleven Labs voice synthesis RESTful client that allows users to convert text to speech. The library targets .NET 8.0 and above, working across various platforms like console apps, winforms, wpf, and asp.net, and across Windows, Linux, and Mac. Users can authenticate using API keys directly, from a configuration file, or system environment variables. The tool provides functionalities for text to speech conversion, streaming text to speech, accessing voices, dubbing audio or video files, generating sound effects, managing history of synthesized audio clips, and accessing user information and subscription status.

github

: 53

omniai

OmniAI provides a unified Ruby API for integrating with multiple AI providers, streamlining AI development by offering a consistent interface for features such as chat, text-to-speech, speech-to-text, and embeddings. It ensures seamless interoperability across platforms and effortless switching between providers, making integrations more flexible and reliable.

github

: 161

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 980

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.9k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 32.1k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675