
airunner
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
Stars: 1241

AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.
README:
π Report Bug |
β¨ Request Feature |
π‘οΈ Report Vulnerability |
π‘οΈ Wiki |
Show your support for this project by choosing one of the following options for donations.
- Crypto: 0x02030569e866e22C9991f55Db0445eeAd2d646c8
- Github Sponsors: https://github.com/sponsors/w4ffl35
- Patreon: https://www.patreon.com/c/w4ffl35
βοΈ Get notified when the packaged version releases
β¨ Key Features |
---|
π£οΈ Real-time conversations |
- Three speech engines: espeak, SpeechT5, OpenVoice - Auto language detection (OpenVoice) - Real-time voice-chat with LLMs |
π€ Customizable AI Agents |
- Custom agent names, moods, personalities - Retrieval-Augmented Generation (RAG) - Create AI personalities and moods |
π Enhanced Knowledge Retrieval |
- RAG for documents/websites - Use local data to enrich chat |
πΌοΈ Image Generation & Manipulation |
- Text-to-Image (Stable Diffusion 1.5, SDXL, Turbo) - Drawing tools & ControlNet - LoRA & Embeddings - Inpainting, outpainting, filters |
π Multi-lingual Capabilities |
- Partial multi-lingual TTS/STT/interface - English & Japanese GUI |
π Privacy and Security |
- Runs locally, no external API (default) - Customizable LLM guardrails & image safety - Disables HuggingFace telemetry - Restricts network access |
β‘ Performance & Utility |
- Fast generation (~2s on RTX 2080s) - Docker-based setup & GPU acceleration - Theming (Light/Dark/System) - NSFW toggles - Extension API - Python library & API support |
Language | TTS | LLM | STT | GUI |
---|---|---|---|---|
English | β | β | β | β |
Japanese | β | β | β | β |
Spanish | β | β | β | β |
French | β | β | β | β |
Chinese | β | β | β | β |
Korean | β | β | β | β |
AI Runner is a powerful tool designed for local, private use. However, its capabilities mean that users must be aware of their responsibilities under emerging AI regulations. This section provides information regarding the Colorado AI Act.
As the developer of AI Runner, we have a duty of care to inform our users about how this law may apply to them.
- Your Role as a User: If you use AI Runner to make, or as a substantial factor in making, an important decision that has a legal or similarly significant effect on someone's life, you may be considered a "deployer" of a "high-risk AI system" under Colorado law.
- What is a "High-Risk" Use Case? Examples of high-risk decisions include using AI to screen job applicants, evaluate eligibility for loans, housing, insurance, or other essential services.
- User Responsibility: Given AI Runner's customizable nature (e.g., using RAG with personal or business documents), it is possible to configure it for such high-risk purposes. If you do so, you are responsible for complying with the obligations of a "deployer," which include performing impact assessments and preventing algorithmic discrimination.
- Our Commitment: We are committed to developing AI Runner responsibly. The built-in privacy features, local-first design, and configurable guardrails are intended to provide you with the tools to use AI safely. We strongly encourage you to understand the capabilities and limitations of the AI models you choose to use and to consider the ethical implications of your specific application.
For more information, we recommend reviewing the text of the Colorado AI Act.
Specification | Minimum | Recommended |
---|---|---|
OS | Ubuntu 22.04, Windows 10 | Ubuntu 22.04 (Wayland) |
CPU | Ryzen 2700K or Intel Core i7-8700K | Ryzen 5800X or Intel Core i7-11700K |
Memory | 16 GB RAM | 32 GB RAM |
GPU | NVIDIA RTX 3060 or better | NVIDIA RTX 4090 or better |
Network | Broadband (used to download models) | Broadband (used to download models) |
Storage | 22 GB (with models), 6 GB (without models) | 100 GB or higher |
-
Install system requirements
sudo apt update && sudo apt upgrade -y sudo apt install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl git nvidia-cuda-toolkit pipewire libportaudio2 libxcb-cursor0 gnupg gpg-agent pinentry-curses espeak xclip cmake qt6-qpa-plugins qt6-wayland qt6-gtk-platformtheme mecab libmecab-dev mecab-ipadic-utf8 libxslt-dev mkcert sudo apt install espeak sudo apt install espeak-ng-espeak
-
Create
airunner
directorysudo mkdir ~/.local/share/airunner sudo chown $USER:$USER ~/.local/share/airunner
-
Install AI Runner - Python 3.13+ required
pyenv
andvenv
are recommended (see wiki for more info)pip install "typing-extensions==4.13.2" pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 pip install airunner[all_dev]
-
Run AI Runner
airunner
For more options, including Docker, see the Installation Wiki.
-
Run AI Runner:
airunner
-
Run the downloader:
airunner-setup
-
Build templates:
airunner-build-ui
These are the sizes of the optional models that power AI Runner.
AI Runner uses the following stack
|
By default, AI Runner installs essential TTS/STT and minimal LLM components, but AI art models must be supplied by the user. Organize them under your local AI Runner data directory:
|
- The chatbot's mood and conversation summary system is always enabled by default. The bot's mood and emoji are shown with each bot message.
- When the LLM is updating the bot's mood or summarizing the conversation, a loading spinner and status message are shown in the chat prompt widget. The indicator disappears as soon as a new message arrives.
- This system is automatic and requires no user configuration.
- For more details, see the LLM Chat Prompt Widget README.
- The mood and summary engines are now fully integrated into the agent runtime. When the agent updates mood or summarizes the conversation, it emits a signal to the UI with a customizable loading message. The chat prompt widget displays this message as a loading indicator.
- See
src/airunner/handlers/llm/agent/agents/base.py
for integration details andsrc/airunner/api/chatbot_services.py
for the API function.
AI Runner includes an Aggregated Search Tool for querying multiple online services from a unified interface. This tool is available as a NodeGraphQt node, an LLM agent tool, and as a Python API.
Supported Search Services:
- DuckDuckGo (no API key required)
- Wikipedia (no API key required)
- arXiv (no API key required)
- Google Custom Search (requires
GOOGLE_API_KEY
andGOOGLE_CSE_ID
) - Bing Web Search (requires
BING_SUBSCRIPTION_KEY
) - NewsAPI (requires
NEWSAPI_KEY
) - StackExchange (optional
STACKEXCHANGE_KEY
for higher quota) - GitHub Repositories (optional
GITHUB_TOKEN
for higher rate limits) - OpenLibrary (no API key required)
API Key Setup:
- Set the required API keys as environment variables before running AI Runner. Only services with valid keys will be queried.
- Example:
export GOOGLE_API_KEY=your_google_api_key export GOOGLE_CSE_ID=your_google_cse_id export BING_SUBSCRIPTION_KEY=your_bing_key export NEWSAPI_KEY=your_newsapi_key export STACKEXCHANGE_KEY=your_stackexchange_key export GITHUB_TOKEN=your_github_token
Usage:
- Use the Aggregated Search node in NodeGraphQt for visual workflows.
- Call the tool from LLM agents or Python code:
from airunner.components.tools import AggregatedSearchTool results = await AggregatedSearchTool.aggregated_search("python", category="web")
- See
src/airunner/tools/README.md
for more details.
Note:
- DuckDuckGo, Wikipedia, arXiv, and OpenLibrary do not require API keys and can be used out-of-the-box.
- For best results and full service coverage, configure all relevant API keys.
AI Runner's local server enforces HTTPS-only operation for all local resources. HTTP is never used or allowed for local static assets or API endpoints. At startup, the server logs explicit details about HTTPS mode and the certificate/key in use. Security headers are set and only GET/HEAD methods are allowed for further hardening.
-
Automatic Certificate Generation (Recommended):
- By default, AI Runner will auto-generate a self-signed certificate in
~/.local/share/airunner/certs/
if one does not exist. No manual steps are required for most users. - If you want to provide your own certificate, place
cert.pem
andkey.pem
in thecerts
directory under your AI Runner base path.
- By default, AI Runner will auto-generate a self-signed certificate in
-
Manual Certificate Generation (Optional):
- You can manually generate a self-signed certificate with:
airunner-generate-cert
- This will create
cert.pem
andkey.pem
in your current directory. Move them to your AI Runner certs directory if you want to use them.
- You can manually generate a self-signed certificate with:
-
Configure AI Runner to Use SSL:
- The app will automatically use the certificates in the certs directory. If you want to override, set the environment variables:
export AIRUNNER_SSL_CERT=~/path/to/cert.pem export AIRUNNER_SSL_KEY=~/path/to/key.pem airunner
- The server will use HTTPS if both files are provided.
- The app will automatically use the certificates in the certs directory. If you want to override, set the environment variables:
-
Access the App via
https://localhost:<port>
- The default port is 5005 (configurable in
src/airunner/settings.py
). - Your browser may warn about the self-signed certificate; you can safely bypass this for local development.
- The default port is 5005 (configurable in
- For production or remote access, use a certificate from a trusted CA.
- Never share your private key (
key.pem
). - The server only binds to
127.0.0.1
by default for safety. - For additional hardening, see the Security guide and the code comments in
local_http_server.py
.
You can generate a self-signed SSL certificate for local HTTPS with a single command:
airunner-generate-cert
This will create cert.pem
and key.pem
in your current directory. Use these files with the local HTTP server as described above.
See the SSL/TLS section for full details.
- For a browser-trusted local HTTPS experience (no warnings), install mkcert:
# On Ubuntu/Debian: sudo apt install libnss3-tools brew install mkcert # (on macOS, or use your package manager) mkcert -install
- If
mkcert
is not installed, AI Runner will fall back to OpenSSL self-signed certificates, which will show browser warnings. - See the SSL/TLS section for details.
AI Runner provides several CLI commands for development, testing, and maintenance. Below is a summary of all available commands:
Command | Description |
---|---|
airunner |
Launch the AI Runner application GUI. |
airunner-setup |
Download and set up required models and data. |
airunner-build-ui |
Regenerate Python UI files from .ui templates. Run after editing any .ui file. |
airunner-compile-translations |
Compile translation files for internationalization. |
airunner-tests |
Run the full test suite using pytest. |
airunner-test-coverage-report |
Generate a test coverage report. |
airunner-docker |
Run Docker-related build and management commands for AI Runner. |
airunner-generate-migration |
Generate a new Alembic database migration. |
airunner-generate-cert |
Generate a self-signed SSL certificate for local HTTPS. |
airunner-mypy <filename> |
Run mypy type checking on a file with project-recommended flags. |
Usage Examples:
# Launch the app
airunner
# Download models and set up data
airunner-setup
# Build UI Python files from .ui templates
airunner-build-ui
# Compile translation files
airunner-compile-translations
# Run all tests
airunner-tests
# Generate a test coverage report
airunner-test-coverage-report
# Run Docker build or management tasks
airunner-docker
# Generate a new Alembic migration
airunner-generate-migration
# Generate a self-signed SSL certificate
airunner-generate-cert
# Run mypy type checking on a file
airunner-mypy src/airunner/components/document_editor/gui/widgets/document_editor_widget.py
For more details on each command, see the Wiki or run the command with --help
if supported.
AI Runner supports a set of powerful chat slash commands, known as Slash Tools, that let you quickly trigger special actions, tools, or workflows directly from the chat prompt. These commands start with a /
and can be used in any chat conversation.
- Type
/
in the chat prompt to see available commands (autocomplete is supported in the UI). - Each slash command maps to a specific tool, agent action, or workflow.
- The set of available commands is extensible and may include custom or extension-provided tools.
Slash | Command | Action Type | Description |
---|---|---|---|
/a |
Image | GENERATE_IMAGE | Generate an image from a prompt |
/c |
Code | CODE | Run or generate code (if supported) |
/s |
Search | SEARCH | Search the web or knowledge base |
/w |
Workflow | WORKFLOW | Run a custom workflow (if supported) |
Note:
- Some slash tools (like
/a
for image) return an immediate confirmation message (e.g., "Ok, I've navigated to ...", "Ok, generating your image..."). - Others (like
/s
for search or/w
for workflow) do not return a direct message, but instead show a loading indicator until the result is ready. - The set of available slash commands is defined in
SLASH_COMMANDS
insrc/airunner/settings.py
and may be extended in the future.
For a full list of supported slash commands, type /help
in the chat prompt or see the copilot-instructions.md.
We welcome pull requests for new features, bug fixes, or documentation improvements. You can also build and share extensions to expand AI Runnerβs functionality. For details, see the Extensions Wiki.
Take a look at the Contributing document and the Development wiki page for detailed instructions.
AI Runner uses pytest
for all automated testing. Test coverage is a priority, especially for utility modules.
-
Headless-safe tests:
- Located in
src/airunner/utils/tests/
- Can be run in any environment (including CI, headless servers, and developer machines)
- Run with:
pytest src/airunner/utils/tests/
- Located in
-
Display-required (Qt/Xvfb) tests:
- Located in
src/airunner/utils/tests/xvfb_required/
- Require a real Qt display environment (cannot be run headlessly or with
pytest-qt
) - Typical for low-level Qt worker/signal/slot logic
- Run with:
xvfb-run -a pytest src/airunner/utils/tests/xvfb_required/ # Or for a single file: xvfb-run -a pytest src/airunner/utils/tests/xvfb_required/test_background_worker.py
- See the README in xvfb_required/ for details.
- Located in
- By default, only headless-safe tests are run in CI.
- Display-required tests are intended for manual or special-case runs (e.g., when working on Qt threading or background worker code).
- (Optional) You may automate this split in CI by adding a separate job/step for xvfb tests.
- All new utility code must be accompanied by tests.
- Use
pytest
,pytest-qt
(for GUI), andunittest.mock
for mocking dependencies. - For more details on writing and organizing tests, see the project coding guidelines and the
src/airunner/utils/tests/
folder.
- Follow the copilot-instructions.md for all development, testing, and contribution guidelines.
- Always use the
airunner
command in the terminal to run the application. - Always run tests in the terminal (not in the workspace test runner).
- Use
pytest
andpytest-cov
for running tests and checking coverage. - UI changes must be made in
.ui
files and rebuilt withairunner-build-ui
.
- See the Wiki for architecture, usage, and advanced topics.
- API Service Layer
- Main Window Model Load Balancer
- Facehugger Shield Suite
- NodeGraphQt Vendor Module
- Xvfb-Required Tests
- ORM Models
For additional details, see the Wiki.
If you find this project useful, please consider sponsoring its development. Your support helps cover the costs of infrastructure, development, and maintenance.
You can sponsor the project on GitHub Sponsors.
Thank you for your support!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for airunner
Similar Open Source Tools

airunner
AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.

coreply
Coreply is an open-source Android app that provides texting suggestions while typing, enhancing the typing experience with intelligent, context-aware suggestions. It supports various texting apps and offers real-time AI suggestions, customizable LLM settings, and ensures no data collection. Users can install the app, configure it with an API key, and start receiving suggestions while typing in messaging apps. The tool supports different AI models from providers like OpenAI, Google AI Studio, Openrouter, Groq, and Codestral for chat completion and fill-in-the-middle tasks.

Starmoon
Starmoon is an affordable, compact AI-enabled device that can understand and respond to your emotions with empathy. It offers supportive conversations and personalized learning assistance. The device is cost-effective, voice-enabled, open-source, compact, and aims to reduce screen time. Users can assemble the device themselves using off-the-shelf components and deploy it locally for data privacy. Starmoon integrates various APIs for AI language models, speech-to-text, text-to-speech, and emotion intelligence. The hardware setup involves components like ESP32S3, microphone, amplifier, speaker, LED light, and button, along with software setup instructions for developers. The project also includes a web app, backend API, and background task dashboard for monitoring and management.

EvoAgentX
EvoAgentX is an open-source framework for building, evaluating, and evolving LLM-based agents or agentic workflows in an automated, modular, and goal-driven manner. It enables developers and researchers to move beyond static prompt chaining or manual workflow orchestration by introducing a self-evolving agent ecosystem. The framework includes features such as agent workflow autoconstruction, built-in evaluation, self-evolution engine, plug-and-play compatibility, comprehensive built-in tools, memory module support, and human-in-the-loop interactions.

openlit
OpenLIT is an OpenTelemetry-native GenAI and LLM Application Observability tool. It's designed to make the integration process of observability into GenAI projects as easy as pie β literally, with just **a single line of code**. Whether you're working with popular LLM Libraries such as OpenAI and HuggingFace or leveraging vector databases like ChromaDB, OpenLIT ensures your applications are monitored seamlessly, providing critical insights to improve performance and reliability.

clearml-fractional-gpu
ClearML Fractional GPU is a tool designed to optimize GPU resource utilization by allowing multiple containers to run on the same GPU with driver-level memory limitation and compute time-slicing. It supports CUDA 11.x & CUDA 12.x, preventing greedy processes from grabbing the entire GPU memory. The tool offers options like Dynamic GPU Slicing, Container-based Memory Limits, and Kubernetes-based Static MIG Slicing to enhance hardware utilization and workload performance for AI development.

pipecat
Pipecat is an open-source framework designed for building generative AI voice bots and multimodal assistants. It provides code building blocks for interacting with AI services, creating low-latency data pipelines, and transporting audio, video, and events over the Internet. Pipecat supports various AI services like speech-to-text, text-to-speech, image generation, and vision models. Users can implement new services and contribute to the framework. Pipecat aims to simplify the development of applications like personal coaches, meeting assistants, customer support bots, and more by providing a complete framework for integrating AI services.

OSA
OSA (Open-Source-Advisor) is a tool designed to improve the quality of scientific open source projects by automating the generation of README files, documentation, CI/CD scripts, and providing advice and recommendations for repositories. It supports various LLMs accessible via API, local servers, or osa_bot hosted on ITMO servers. OSA is currently under development with features like README file generation, documentation generation, automatic implementation of changes, LLM integration, and GitHub Action Workflow generation. It requires Python 3.10 or higher and tokens for GitHub/GitLab/Gitverse and LLM API key. Users can install OSA using PyPi or build from source, and run it using CLI commands or Docker containers.

MooER
MooER (ζ©θ³) is an LLM-based speech recognition and translation model developed by Moore Threads. It allows users to transcribe speech into text (ASR) and translate speech into other languages (AST) in an end-to-end manner. The model was trained using 5K hours of data and is now also available with an 80K hours version. MooER is the first LLM-based speech model trained and inferred using domestic GPUs. The repository includes pretrained models, inference code, and a Gradio demo for a better user experience.

WeKnora
WeKnora is a document understanding and semantic retrieval framework based on large language models (LLM), designed specifically for scenarios with complex structures and heterogeneous content. The framework adopts a modular architecture, integrating multimodal preprocessing, semantic vector indexing, intelligent recall, and large model generation reasoning to build an efficient and controllable document question-answering process. The core retrieval process is based on the RAG (Retrieval-Augmented Generation) mechanism, combining context-relevant segments with language models to achieve higher-quality semantic answers. It supports various document formats, intelligent inference, flexible extension, efficient retrieval, ease of use, and security and control. Suitable for enterprise knowledge management, scientific literature analysis, product technical support, legal compliance review, and medical knowledge assistance.

terminator
Terminator is an AI-powered desktop automation tool that is open source, MIT-licensed, and cross-platform. It works across all apps and browsers, inspired by GitHub Actions & Playwright. It is 100x faster than generic AI agents, with over 95% success rate and no vendor lock-in. Users can create automations that work across any desktop app or browser, achieve high success rates without costly consultant armies, and pre-train workflows as deterministic code.

superduperdb
SuperDuperDB is a Python framework for integrating AI models, APIs, and vector search engines directly with your existing databases, including hosting of your own models, streaming inference and scalable model training/fine-tuning. Build, deploy and manage any AI application without the need for complex pipelines, infrastructure as well as specialized vector databases, and moving our data there, by integrating AI at your data's source: - Generative AI, LLMs, RAG, vector search - Standard machine learning use-cases (classification, segmentation, regression, forecasting recommendation etc.) - Custom AI use-cases involving specialized models - Even the most complex applications/workflows in which different models work together SuperDuperDB is **not** a database. Think `db = superduper(db)`: SuperDuperDB transforms your databases into an intelligent platform that allows you to leverage the full AI and Python ecosystem. A single development and deployment environment for all your AI applications in one place, fully scalable and easy to manage.

Automodel
Automodel is a Python library for automating the process of building and evaluating machine learning models. It provides a set of tools and utilities to streamline the model development workflow, from data preprocessing to model selection and evaluation. With Automodel, users can easily experiment with different algorithms, hyperparameters, and feature engineering techniques to find the best model for their dataset. The library is designed to be user-friendly and customizable, allowing users to define their own pipelines and workflows. Automodel is suitable for data scientists, machine learning engineers, and anyone looking to quickly build and test machine learning models without the need for manual intervention.

spandrel
Spandrel is a library for loading and running pre-trained PyTorch models. It automatically detects the model architecture and hyperparameters from model files, and provides a unified interface for running models.

agentscope
AgentScope is a multi-agent platform designed to empower developers to build multi-agent applications with large-scale models. It features three high-level capabilities: Easy-to-Use, High Robustness, and Actor-Based Distribution. AgentScope provides a list of `ModelWrapper` to support both local model services and third-party model APIs, including OpenAI API, DashScope API, Gemini API, and ollama. It also enables developers to rapidly deploy local model services using libraries such as ollama (CPU inference), Flask + Transformers, Flask + ModelScope, FastChat, and vllm. AgentScope supports various services, including Web Search, Data Query, Retrieval, Code Execution, File Operation, and Text Processing. Example applications include Conversation, Game, and Distribution. AgentScope is released under Apache License 2.0 and welcomes contributions.

nexa-sdk
Nexa SDK is a comprehensive toolkit supporting ONNX and GGML models for text generation, image generation, vision-language models (VLM), and text-to-speech (TTS) capabilities. It offers an OpenAI-compatible API server with JSON schema mode and streaming support, along with a user-friendly Streamlit UI. Users can run Nexa SDK on any device with Python environment, with GPU acceleration supported. The toolkit provides model support, conversion engine, inference engine for various tasks, and differentiating features from other tools.
For similar tasks

wunjo.wladradchenko.ru
Wunjo AI is a comprehensive tool that empowers users to explore the realm of speech synthesis, deepfake animations, video-to-video transformations, and more. Its user-friendly interface and privacy-first approach make it accessible to both beginners and professionals alike. With Wunjo AI, you can effortlessly convert text into human-like speech, clone voices from audio files, create multi-dialogues with distinct voice profiles, and perform real-time speech recognition. Additionally, you can animate faces using just one photo combined with audio, swap faces in videos, GIFs, and photos, and even remove unwanted objects or enhance the quality of your deepfakes using the AI Retouch Tool. Wunjo AI is an all-in-one solution for your voice and visual AI needs, offering endless possibilities for creativity and expression.

airunner
AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.

Wechat-AI-Assistant
Wechat AI Assistant is a project that enables multi-modal interaction with ChatGPT AI assistant within WeChat. It allows users to engage in conversations, role-playing, respond to voice messages, analyze images and videos, summarize articles and web links, and search the internet. The project utilizes the WeChatFerry library to control the Windows PC desktop WeChat client and leverages the OpenAI Assistant API for intelligent multi-modal message processing. Users can interact with ChatGPT AI in WeChat through text or voice, access various tools like bing_search, browse_link, image_to_text, text_to_image, text_to_speech, video_analysis, and more. The AI autonomously determines which code interpreter and external tools to use to complete tasks. Future developments include file uploads for AI to reference content, integration with other APIs, and login support for enterprise WeChat and WeChat official accounts.

Generative-AI-Pharmacist
Generative AI Pharmacist is a project showcasing the use of generative AI tools to create an animated avatar named Macy, who delivers medication counseling in a realistic and professional manner. The project utilizes tools like Midjourney for image generation, ChatGPT for text generation, ElevenLabs for text-to-speech conversion, and D-ID for creating a photorealistic talking avatar video. The demo video featuring Macy discussing commonly-prescribed medications demonstrates the potential of generative AI in healthcare communication.

AnyGPT
AnyGPT is a unified multimodal language model that utilizes discrete representations for processing various modalities like speech, text, images, and music. It aligns the modalities for intermodal conversions and text processing. AnyInstruct dataset is constructed for generative models. The model proposes a generative training scheme using Next Token Prediction task for training on a Large Language Model (LLM). It aims to compress vast multimodal data on the internet into a single model for emerging capabilities. The tool supports tasks like text-to-image, image captioning, ASR, TTS, text-to-music, and music captioning.

Pallaidium
Pallaidium is a generative AI movie studio integrated into the Blender video editor. It allows users to AI-generate video, image, and audio from text prompts or existing media files. The tool provides various features such as text to video, text to audio, text to speech, text to image, image to image, image to video, video to video, image to text, and more. It requires a Windows system with a CUDA-supported Nvidia card and at least 6 GB VRAM. Pallaidium offers batch processing capabilities, text to audio conversion using Bark, and various performance optimization tips. Users can install the tool by downloading the add-on and following the installation instructions provided. The tool comes with a set of restrictions on usage, prohibiting the generation of harmful, pornographic, violent, or false content.

ElevenLabs-DotNet
ElevenLabs-DotNet is a non-official Eleven Labs voice synthesis RESTful client that allows users to convert text to speech. The library targets .NET 8.0 and above, working across various platforms like console apps, winforms, wpf, and asp.net, and across Windows, Linux, and Mac. Users can authenticate using API keys directly, from a configuration file, or system environment variables. The tool provides functionalities for text to speech conversion, streaming text to speech, accessing voices, dubbing audio or video files, generating sound effects, managing history of synthesized audio clips, and accessing user information and subscription status.

omniai
OmniAI provides a unified Ruby API for integrating with multiple AI providers, streamlining AI development by offering a consistent interface for features such as chat, text-to-speech, speech-to-text, and embeddings. It ensures seamless interoperability across platforms and effortless switching between providers, making integrations more flexible and reliable.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.