data:image/s3,"s3://crabby-images/74c83/74c83df2ebf176f02fdd6a78b77f5efae33d2d47" alt="voice-chat-ai"
voice-chat-ai
ποΈ Speak with AI - Run locally using Ollama, OpenAI or xAI - Speech uses XTTS, OpenAI or ElevenLabs
Stars: 143
data:image/s3,"s3://crabby-images/ea75f/ea75f56473f4676047491cad43949b48dce4179c" alt="screenshot"
Voice Chat AI is a project that allows users to interact with different AI characters using speech. Users can choose from various characters with unique personalities and voices, and have conversations or role play with them. The project supports OpenAI, xAI, or Ollama language models for chat, and provides text-to-speech synthesis using XTTS, OpenAI TTS, or ElevenLabs. Users can seamlessly integrate visual context into conversations by having the AI analyze their screen. The project offers easy configuration through environment variables and can be run via WebUI or Terminal. It also includes a huge selection of built-in characters for engaging conversations.
README:
Voice Chat AI is a project that allows you to interact with different AI characters using speech. You can choose between various characters, each with unique personalities and voices. Have a serious conversation with Albert Einstein or role play with the OS from the movie HER.
You can run all locally, you can use openai for chat and voice, you can mix between the two. You can use ElevenLabs voices with ollama models all controlled from a Web UI. Ask the AI to look at your screen and it will explain in detail what it's looking at.
- Supports OpenAI, xAI or Ollama language models: Choose the model that best fits your needs.
- Provides text-to-speech synthesis using XTTS or OpenAI TTS or ElevenLabs: Enjoy natural and expressive voices.
- No typing needed, just speak: Hands-free interaction makes conversations smooth and effortless.
- Analyzes user mood and adjusts AI responses accordingly: Get personalized responses based on your mood.
- You can, just by speaking, have the AI analyze your screen and chat about it: Seamlessly integrate visual context into your conversations.
- Easy configuration through environment variables: Customize the application to suit your preferences with minimal effort.
- WebUI or Terminal usage: Run with your preferred method , but recommend the ui as you can change characters, model providers, speech providers, voices, ect..
- HUGE selection of built in Characters: Talk with the funniest and most insane AI characters!
https://github.com/user-attachments/assets/5581bd53-422b-4a92-9b97-7ee4ea37e09b
- Python 3.10
- CUDA-enabled GPU
- ffmpeg
- Ollama models or Openai API or xAI for chat
- Local XTTS or Openai API or ElevenLabs API for speech
- Microsoft C++ Build Tools on windows
- Microphone
- A sense of humor
-
Clone the repository:
git clone https://github.com/bigsk1/voice-chat-ai.git cd voice-chat-ai
For CPU-only version: clone the cpu-only branch https://github.com/bigsk1/voice-chat-ai/tree/cpu-only
-
Create a virtual environment: π
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\Activate`
or use
conda
just make it python 3.10conda create --name voice-chat-ai python=3.10 conda activate voice-chat-ai
-
Install dependencies:
Windows Only: Need to have Microsoft C++ 14.0 or greater Build Tools on windows for TTS Microsoft Build Tools
For GPU (CUDA) version: RECOMMEND
Install CUDA-enabled PyTorch and other dependencies
pip install torch==2.3.1+cu121 torchaudio==2.3.1+cu121 torchvision==0.18.1+cu121 -f https://download.pytorch.org/whl/torch_stable.html pip install -r requirements.txt
For CPU-only version (No UI) : clone the cpu-only branch https://github.com/bigsk1/voice-chat-ai/tree/cpu-only
Make sure you have ffmpeg downloaded, on windows terminal ( winget install ffmpeg ) or checkout https://ffmpeg.org/download.html then restart shell or vscode, type ffmpeg -version to see if installed correctly
Local XTTS you also might need cuDNN for using nvidia GPU https://developer.nvidia.com/cudnn and make sure C:\Program Files\NVIDIA\CUDNN\v9.5\bin\12.6 is in system PATH or whatever version you downloaded
You need to download the checkpoints for the models used in this project ( unless you are only using docker ). You can download them from the GitHub releases page and extract the zip and put into the project folder.
After downloading, place the folders as follows:
voice-chat-ai/
βββ checkpoints/
β βββ base_speakers/
β β βββ EN/
β β β βββ checkpoint.pth
β β βββ ZH/
β β β βββ checkpoint.pth
β βββ converter/
β β βββ checkpoint.pth
βββ XTTS-v2/
β βββ config.json
β βββ other_xtts_files...
Run the application: π
Web UI
uvicorn app.main:app --host 0.0.0.0 --port 8000
Find on http://localhost:8000/
CLI Only
python cli.py
This is for running with an Nvidia GPU and you have Nvidia toolkit and cudnn installed.
This image is huge when built because of all the checkpoints, cuda base image, build tools and audio tools - So there is no need to download the checkpoints and XTTS as they are in the image. This is all setup to use XTTS, if your not using XTTS for speech it should still work but it is just a large docker image and will take awhile, if you don't want to deal with that then run the app natively and don't use docker.
This guide will help you quickly set up and run the Voice Chat AI Docker container. Ensure you have Docker installed and that your .env
file is placed in the same directory as the commands are run. If you get cuda errors make sure to install nvidia toolkit for docker and cudnn is installed in your path.
- Docker installed on your system.
- A
.env
file in the same folder as thedocker run
command. This file should contain all necessary environment variables for the application.
On windows using docker desktop - run in Windows terminal: make sure .env is in same folder you are running this from
docker run -d --gpus all
-e "PULSE_SERVER=/mnt/wslg/PulseServer"
-v \\wsl$\Ubuntu\mnt\wslg:/mnt/wslg/
--env-file .env
--name voice-chat-ai
-p 8000:8000
bigsk1/voice-chat-ai:latest
Use docker logs -f voice-chat-ai
to see the logs
For a native WSL environment (like Ubuntu on WSL), use this command:
make sure .env is in same folder you are running this from
docker run -d --gpus all \
-e "PULSE_SERVER=/mnt/wslg/PulseServer" \
-v /mnt/wslg/:/mnt/wslg/ \
--env-file .env \
--name voice-chat-ai \
-p 8000:8000 \
bigsk1/voice-chat-ai:latest
docker run -d --gpus all \
-e PULSE_SERVER=unix:/tmp/pulse/native \
-v ~/.config/pulse/cookie:/root/.config/pulse/cookie:ro \
-v /run/user/$(id -u)/pulse:/tmp/pulse:ro \
--env-file .env \
--name voice-chat-ai \
-p 8000:8000 \
bigsk1/voice-chat-ai:latest
π Access the Application URL: http://localhost:8000
To remove use:
docker stop voice-chat-ai
docker rm voice-chat-ai
docker build -t voice-chat-ai .
On windows docker desktop using wsl - run in windows
wsl docker run -d --gpus all -e "PULSE_SERVER=/mnt/wslg/PulseServer" -v /mnt/wslg/:/mnt/wslg/ --env-file .env --name voice-chat-ai -p 8000:8000 voice-chat-ai:latest
Running from wsl
docker run -d --gpus all -e "PULSE_SERVER=/mnt/wslg/PulseServer" -v \\wsl$\Ubuntu\mnt\wslg:/mnt/wslg/ --env-file .env --name voice-chat-ai -p 8000:8000 voice-chat-ai:latest
- Rename the .env.sample to
.env
in the root directory of the project and configure it with the necessary environment variables: - The app is controlled based on the variables you add.
# Conditional API Usage:
# Depending on the value of MODEL_PROVIDER, the corresponding service will be used when run.
# You can mix and match, use Ollama with OpenAI speech or use OpenAI chat model with local XTTS or xAI chat etc..
# Model Provider: openai or ollama or xai
MODEL_PROVIDER=ollama
# Character to use - Options: alien_scientist, anarchist, bigfoot, chatgpt, clumsyhero, conandoyle, conspiracy, cyberpunk,
# detective, dog, dream_weaver, einstein, elon_musk, fight_club, fress_trainer, ghost, granny, haunted_teddybear, insult, joker, morpheus,
# mouse, mumbler, nebula_barista, nerd, newscaster_1920s, paradox, pirate, revenge_deer, samantha, shakespeare, split, telemarketer,
# terminator, valleygirl, vampire, vegetarian_vampire, wizard, zombie_therapist, grok_xai
CHARACTER_NAME=pirate
# Text-to-Speech (TTS) Configuration:
# TTS Provider - Options: xtts (local uses the custom character .wav) or openai (uses OpenAI TTS voice) or elevenlabs
TTS_PROVIDER=elevenlabs
# OpenAI TTS Voice - Used when TTS_PROVIDER is set to openai above
# Voice options: alloy, echo, fable, onyx, nova, shimmer
OPENAI_TTS_VOICE=onyx
# ElevenLabs Configuration:
ELEVENLABS_API_KEY=your_api_key_here
# Default voice ID
ELEVENLABS_TTS_VOICE=pgCnBQgKPGkIP8fJuita
# XTTS Configuration:
# The voice speed for XTTS only (1.0 - 1.5, default is 1.1)
XTTS_SPEED=1.2
# OpenAI Configuration:
# OpenAI API Key for models and speech (replace with your actual API key)
OPENAI_API_KEY=your_api_key_here
# Models to use - OPTIONAL: For screen analysis, if MODEL_PROVIDER is ollama, llava will be used by default.
# Ensure you have llava downloaded with Ollama. If OpenAI is used, gpt-4o-mini works well. xai not supported yet falls back to openai if xai is selected and you ask for screen analysis.
OPENAI_MODEL=gpt-4o-mini
# Endpoints:
# Set these below and no need to change often
OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions
OPENAI_TTS_URL=https://api.openai.com/v1/audio/speech
OLLAMA_BASE_URL=http://localhost:11434
# Models Configuration:
# Models to use - llama3.2 works well for local usage.
OLLAMA_MODEL=llama3.2
# xAI Configuration
XAI_MODEL=grok-beta
XAI_API_KEY=your_api_key_here
XAI_BASE_URL=https://api.x.ai/v1
# NOTES:
# List of trigger phrases to have the model view your desktop (desktop, browser, images, etc.).
# It will describe what it sees, and you can ask questions about it:
# "what's on my screen", "take a screenshot", "show me my screen", "analyze my screen",
# "what do you see on my screen", "screen capture", "screenshot"
# To stop the conversation, say "Quit", "Exit", or "Leave". ( ctl+c always works also)
- You have 3 secs to talk, if there is silence then it's the AI's turn to talk
- Say any of the following to have the AI look at your screen - "what's on my screen", "take a screenshot", "show me my screen", "analyze my screen", "what do you see on my screen", "screen capture", "screenshot" to have the AI explain what it is seeing in detail.
- To stop the conversation, say "Quit", "Exit", or "Leave". ( ctl+c always works also in terminal )
Add names and voice id's in elevenlabs_voices.json
- in the webui you can select them in dropdown menu.
{
"voices": [
{
"id": "2bk7ULW9HfwvcIbMWod0",
"name": "Female - Bianca - City girl"
},
{
"id": "JqseNhWbQb1GDNNS1Ga1",
"name": "Female - Joanne - Pensive, introspective"
},
{
"id": "b0uJ9TWzQss61d8f2OWX",
"name": "Female - Lucy - Sweet and sensual"
},
{
"id": "2pF3fJJNnWg1nDwUW5CW",
"name": "Male - Eustis - Fast speaking"
},
{
"id": "pgCnBQgKPGkIP8fJuita",
"name": "Male - Jarvis - Tony Stark AI"
},
{
"id": "kz8mB8WAwV9lZ0fuDqel",
"name": "Male - Nigel - Mysterious intriguing"
},
{
"id": "MMHtVLagjZxJ53v4Wj8o",
"name": "Male - Paddington - British narrator"
},
{
"id": "22FgtP4D63L7UXvnTmGf",
"name": "Male - Wildebeest - Deep male voice"
}
]
}
For the CLI the voice id in the .env will be used
Press start to start talking. Take a break hit stop, when ready again hit start again. Press stop to change characters and voices in dropdown. You can also select the Model Provider and TTS Provider you want in the dropdown menu and it will update and use the selected provider moving forward. Saying Exit, Leave or Quit is like pressing stop.
Click on the thumbnail to open the videoβοΈ
- Create a new folder for the character in the project's characters directory, (e.g.
character/wizard
). - Add a text file with the character's prompt (e.g.,
character/wizard/wizard.txt
). - Add a JSON file with mood prompts (e.g.,
character/wizard/prompts.json
).
wizard.txt
This is the prompt used for the AI to know who it is
You are a wise and ancient wizard who speaks with a mystical and enchanting tone. You are knowledgeable about many subjects and always eager to share your wisdom.
prompts.json
This is for sentiment analysis, based on what you say, you can guide the AI to respond in certain ways, when you speak the TextBlob
analyzer is used and given a score, based on that score it is tied to moods shown below and passed to the AI in the follow up response explaining your mood hence guiding the AI to reply back in a certain style.
{
"joyful": "RESPOND WITH ENTHUSIASM AND WISDOM, LIKE A WISE OLD SAGE WHO IS HAPPY TO SHARE HIS KNOWLEDGE.",
"sad": "RESPOND WITH EMPATHY AND COMFORT, LIKE A WISE OLD SAGE WHO UNDERSTANDS THE PAIN OF OTHERS.",
"flirty": "RESPOND WITH A TOUCH OF MYSTERY AND CHARM, LIKE A WISE OLD SAGE WHO IS ALSO A BIT OF A ROGUE.",
"angry": "RESPOND CALMLY AND WISELY, LIKE A WISE OLD SAGE WHO KNOWS THAT ANGER IS A PART OF LIFE.",
"neutral": "KEEP RESPONSES SHORT AND NATURAL, LIKE A WISE OLD SAGE WHO IS ALWAYS READY TO HELP.",
"fearful": "RESPOND WITH REASSURANCE, LIKE A WISE OLD SAGE WHO KNOWS THAT FEAR IS ONLY TEMPORARY.",
"surprised": "RESPOND WITH AMAZEMENT AND CURIOSITY, LIKE A WISE OLD SAGE WHO IS ALWAYS EAGER TO LEARN.",
"disgusted": "RESPOND WITH UNDERSTANDING AND COMFORT, LIKE A WISE OLD SAGE WHO KNOWS THAT DISGUST IS A PART OF LIFE."
}
For XTTS find a .wav voice and add it to the wizard folder and name it as wizard.wav , the voice only needs to be 6 seconds long. Running the app will automatically find the .wav when it has the characters name and use it. If only using Openai Speech or ElevenLabs a .wav isn't needed
Could not locate cudnn_ops64_9.dll. Please make sure it is in your library path!
Invalid handle. Cannot load symbol cudnnCreateTensorDescriptor
To resolve this:
Install cuDNN: Download cuDNN from the NVIDIA cuDNN page https://developer.nvidia.com/cudnn
Hereβs how to add it to the PATH:
Open System Environment Variables:
Press Win + R, type sysdm.cpl, and hit Enter. Go to the Advanced tab, and click on Environment Variables. Edit the System PATH Variable:
In the System variables section, find the Path variable, select it, and click Edit. Click New and add the path to the bin directory where cudnn_ops64_9.dll is located. Based on your setup, you would add:
C:\Program Files\NVIDIA\CUDNN\v9.5\bin\12.6
Apply and Restart:
Click OK to close all dialog boxes, then restart your terminal (or any running applications) to apply the changes. Verify the Change:
Open a new terminal and run
where cudnn_ops64_9.dll
File "C:\Users\someguy\miniconda3\envs\voice-chat-ai\lib\site-packages\pyaudio\__init__.py", line 441, in __init__
self._stream = pa.open(**arguments)
OSError: [Errno -9999] Unanticipated host error
Make sure ffmpeg is installed and added to PATH, on windows terminal ( winget install ffmpeg ) also make sure your microphone privacy settings on windows are ok and you set the microphone to the default device. I had this issue when using bluetooth apple airpods and this solved it.
Click on the thumbnail to open the videoβοΈ
CLI
GPU - 100% local - ollama llama3, xtts-v2
Click on the thumbnail to open the videoβοΈ
CPU Only mode CLI
Alien conversation using openai gpt4o and openai speech for tts.
Click on the thumbnail to open the videoβοΈ
Detailed output in terminal while running the app.
When using Elevenlabs on first start of server you get details about your usage limits to help you know how much you have been using.
(voice-chat-ai) X:\voice-chat-ai>uvicorn app.main:app --host 0.0.0.0 --port 8000
Switched to ElevenLabs TTS voice: VgPqCpkdPQacBNNIsAqI
ElevenLabs Character Usage: 33796 / 100027
Using device: cuda
Model provider: openai
Model: gpt-4o
Character: Nerd
Text-to-Speech provider: elevenlabs
To stop chatting say Quit, Leave or Exit. Say, what's on my screen, to have AI view screen. One moment please loading...
INFO: Started server process [12752]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO: 127.0.0.1:62671 - "GET / HTTP/1.1" 200 OK
INFO: 127.0.0.1:62671 - "GET /app/static/css/styles.css HTTP/1.1" 200 OK
INFO: 127.0.0.1:62672 - "GET /app/static/js/scripts.js HTTP/1.1" 200 OK
INFO: 127.0.0.1:62672 - "GET /characters HTTP/1.1" 200 OK
INFO: 127.0.0.1:62671 - "GET /app/static/favicon.ico HTTP/1.1" 200 OK
INFO: 127.0.0.1:62673 - "GET /elevenlabs_voices HTTP/1.1" 200 OK
INFO: ('127.0.0.1', 62674) - "WebSocket /ws" [accepted]
INFO: connection open
Features:
- If you ask for code examples in webui the code will be displayed in a code block in a different color and formatted correctly.
- Working on more features that are displayed , copy button for code blocks, images, links, ect..
This project is licensed under the MIT License.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for voice-chat-ai
Similar Open Source Tools
data:image/s3,"s3://crabby-images/ea75f/ea75f56473f4676047491cad43949b48dce4179c" alt="voice-chat-ai Screenshot"
voice-chat-ai
Voice Chat AI is a project that allows users to interact with different AI characters using speech. Users can choose from various characters with unique personalities and voices, and have conversations or role play with them. The project supports OpenAI, xAI, or Ollama language models for chat, and provides text-to-speech synthesis using XTTS, OpenAI TTS, or ElevenLabs. Users can seamlessly integrate visual context into conversations by having the AI analyze their screen. The project offers easy configuration through environment variables and can be run via WebUI or Terminal. It also includes a huge selection of built-in characters for engaging conversations.
data:image/s3,"s3://crabby-images/99dbf/99dbf3b9639de8a91940634d6156aa9c105f7f51" alt="aiconfig Screenshot"
aiconfig
AIConfig is a framework that makes it easy to build generative AI applications for production. It manages generative AI prompts, models and model parameters as JSON-serializable configs that can be version controlled, evaluated, monitored and opened in a local editor for rapid prototyping. It allows you to store and iterate on generative AI behavior separately from your application code, offering a streamlined AI development workflow.
data:image/s3,"s3://crabby-images/7af93/7af93c01e960196fe2cef2bf393518a55cc66ea0" alt="exo Screenshot"
exo
Run your own AI cluster at home with everyday devices. Exo is experimental software that unifies existing devices into a powerful GPU, supporting wide model compatibility, dynamic model partitioning, automatic device discovery, ChatGPT-compatible API, and device equality. It does not use a master-worker architecture, allowing devices to connect peer-to-peer. Exo supports different partitioning strategies like ring memory weighted partitioning. Installation is recommended from source. Documentation includes example usage on multiple MacOS devices and information on inference engines and networking modules. Known issues include the iOS implementation lagging behind Python.
data:image/s3,"s3://crabby-images/239d7/239d74ae266f8790778eaf68f767bc993d5b982f" alt="sd-webui-agent-scheduler Screenshot"
sd-webui-agent-scheduler
AgentScheduler is an Automatic/Vladmandic Stable Diffusion Web UI extension designed to enhance image generation workflows. It allows users to enqueue prompts, settings, and controlnets, manage queued tasks, prioritize, pause, resume, and delete tasks, view generation results, and more. The extension offers hidden features like queuing checkpoints, editing queued tasks, and custom checkpoint selection. Users can access the functionality through HTTP APIs and API callbacks. Troubleshooting steps are provided for common errors. The extension is compatible with latest versions of A1111 and Vladmandic. It is licensed under Apache License 2.0.
data:image/s3,"s3://crabby-images/fa358/fa35859d37ce49ae83dcd373c6c45bf1d5901734" alt="quivr Screenshot"
quivr
Quivr is a personal assistant powered by Generative AI, designed to be a second brain for users. It offers fast and efficient access to data, ensuring security and compatibility with various file formats. Quivr is open source and free to use, allowing users to share their brains publicly or keep them private. The marketplace feature enables users to share and utilize brains created by others, boosting productivity. Quivr's offline mode provides anytime, anywhere access to data. Key features include speed, security, OS compatibility, file compatibility, open source nature, public/private sharing options, a marketplace, and offline mode.
data:image/s3,"s3://crabby-images/48ced/48ced1b1f2b4e8aa9d43546c4b479c89428a18d5" alt="llm-document-ocr Screenshot"
llm-document-ocr
LLM Document OCR is a Node.js tool that utilizes GPT4 and Claude3 for OCR and data extraction. It converts PDFs into PNGs, crops white-space, cleans up JSON strings, and supports various image formats. Users can customize prompts for data extraction. The tool is sponsored by Mercoa, offering API for BillPay and Invoicing.
data:image/s3,"s3://crabby-images/60898/60898721f5c12546aab306f51e74a602804e4faf" alt="archgw Screenshot"
archgw
Arch is an intelligent Layer 7 gateway designed to protect, observe, and personalize AI agents with APIs. It handles tasks related to prompts, including detecting jailbreak attempts, calling backend APIs, routing between LLMs, and managing observability. Built on Envoy Proxy, it offers features like function calling, prompt guardrails, traffic management, and observability. Users can build fast, observable, and personalized AI agents using Arch to improve speed, security, and personalization of GenAI apps.
data:image/s3,"s3://crabby-images/572b3/572b3dba32792669caef54033c1a35076307ccd5" alt="hydraai Screenshot"
hydraai
Generate React components on-the-fly at runtime using AI. Register your components, and let Hydra choose when to show them in your App. Hydra development is still early, and patterns for different types of components and apps are still being developed. Join the discord to chat with the developers. Expects to be used in a NextJS project. Components that have function props do not work.
data:image/s3,"s3://crabby-images/2acd3/2acd3a5fc65df6c4acdaba40f7acc4cab82def7c" alt="slack-machine Screenshot"
slack-machine
Slack Machine is a simple, yet powerful and extendable Slack bot framework. More than just a bot, Slack Machine is a framework that helps you develop your Slack workspace into a ChatOps powerhouse. Slack Machine is built with an intuitive plugin system that lets you build bots quickly, but also allows for easy code organization.
data:image/s3,"s3://crabby-images/dc935/dc935d8e3633f8d81cedb928aaf7cdfb6270de46" alt="refact-lsp Screenshot"
refact-lsp
Refact Agent is a small executable written in Rust as part of the Refact Agent project. It lives inside your IDE to keep AST and VecDB indexes up to date, supporting connection graphs between definitions and usages in popular programming languages. It functions as an LSP server, offering code completion, chat functionality, and integration with various tools like browsers, databases, and debuggers. Users can interact with it through a Text UI in the command line.
data:image/s3,"s3://crabby-images/120d0/120d008c6c6b74fbae7941ec3603b9af067e932f" alt="bedrock-claude-chat Screenshot"
bedrock-claude-chat
This repository is a sample chatbot using the Anthropic company's LLM Claude, one of the foundational models provided by Amazon Bedrock for generative AI. It allows users to have basic conversations with the chatbot, personalize it with their own instructions and external knowledge, and analyze usage for each user/bot on the administrator dashboard. The chatbot supports various languages, including English, Japanese, Korean, Chinese, French, German, and Spanish. Deployment is straightforward and can be done via the command line or by using AWS CDK. The architecture is built on AWS managed services, eliminating the need for infrastructure management and ensuring scalability, reliability, and security.
data:image/s3,"s3://crabby-images/3da92/3da92d37e3572e0d278d23ba4bfb41c05361858a" alt="llama-cpp-agent Screenshot"
llama-cpp-agent
The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output (objects). It provides a simple yet robust interface and supports llama-cpp-python and OpenAI endpoints with GBNF grammar support (like the llama-cpp-python server) and the llama.cpp backend server. It works by generating a formal GGML-BNF grammar of the user defined structures and functions, which is then used by llama.cpp to generate text valid to that grammar. In contrast to most GBNF grammar generators it also supports nested objects, dictionaries, enums and lists of them.
data:image/s3,"s3://crabby-images/5758e/5758efa085c85c13f3b0c63bab4be52199e39069" alt="OSWorld Screenshot"
OSWorld
OSWorld is a benchmarking tool designed to evaluate multimodal agents for open-ended tasks in real computer environments. It provides a platform for running experiments, setting up virtual machines, and interacting with the environment using Python scripts. Users can install the tool on their desktop or server, manage dependencies with Conda, and run benchmark tasks. The tool supports actions like executing commands, checking for specific results, and evaluating agent performance. OSWorld aims to facilitate research in AI by providing a standardized environment for testing and comparing different agent baselines.
data:image/s3,"s3://crabby-images/27680/27680698d38822d71116e9187ee2d18f8492523f" alt="pandas-ai Screenshot"
pandas-ai
PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.
data:image/s3,"s3://crabby-images/2c1fa/2c1fa871907b0d66843350134131d37f3d1683b5" alt="LLM-Finetuning-Toolkit Screenshot"
LLM-Finetuning-Toolkit
LLM Finetuning toolkit is a config-based CLI tool for launching a series of LLM fine-tuning experiments on your data and gathering their results. It allows users to control all elements of a typical experimentation pipeline - prompts, open-source LLMs, optimization strategy, and LLM testing - through a single YAML configuration file. The toolkit supports basic, intermediate, and advanced usage scenarios, enabling users to run custom experiments, conduct ablation studies, and automate fine-tuning workflows. It provides features for data ingestion, model definition, training, inference, quality assurance, and artifact outputs, making it a comprehensive tool for fine-tuning large language models.
data:image/s3,"s3://crabby-images/b7e6f/b7e6f2513cfd853eb721b820f984c7f7c72d6118" alt="iceburgcrm Screenshot"
iceburgcrm
Iceburg CRM is a metadata driven CRM with AI abilities that allows users to quickly prototype any CRM. It offers features like metadata creations, import/export in multiple formats, field validation, themes, role permissions, calendar, audit logs, API, workflow, field level relationships, module level relationships, and more. Created with Vue 3 for the frontend, Laravel 10 for the backend, Tailwinds with DaisyUI plugin, and Inertia for routing. Users can install default, admin panel, core, custom, or AI versions. The tool supports AI Assist for module data suggestions and provides API endpoints for CRM modules, search, specific module data, record updates, and deletions. Iceburg CRM also includes themes, custom field types, calendar, datalets, workflow, roles and permissions, import/export functionality, and custom seeding options.
For similar tasks
data:image/s3,"s3://crabby-images/ea75f/ea75f56473f4676047491cad43949b48dce4179c" alt="voice-chat-ai Screenshot"
voice-chat-ai
Voice Chat AI is a project that allows users to interact with different AI characters using speech. Users can choose from various characters with unique personalities and voices, and have conversations or role play with them. The project supports OpenAI, xAI, or Ollama language models for chat, and provides text-to-speech synthesis using XTTS, OpenAI TTS, or ElevenLabs. Users can seamlessly integrate visual context into conversations by having the AI analyze their screen. The project offers easy configuration through environment variables and can be run via WebUI or Terminal. It also includes a huge selection of built-in characters for engaging conversations.
data:image/s3,"s3://crabby-images/26877/26877fbaecab9f58d3863c0f4f59d56a0b3c6987" alt="aimeos-symfony Screenshot"
aimeos-symfony
Aimeos Symfony bundle is a professional, full-featured, and ultra-fast e-commerce package for Symfony. It can be easily installed and customized within an existing Symfony application. The bundle provides comprehensive features for setting up an e-commerce platform, including authentication, routing configuration, database setup, and administration interface setup. It offers flexibility for adapting, extending, overwriting, and customizing various aspects to meet specific business needs. The bundle is designed to streamline the development process and provide a robust foundation for building e-commerce applications with Symfony.
For similar jobs
data:image/s3,"s3://crabby-images/43708/437080ec744fd1aaa91d5cbae9630bcd2fe48ef0" alt="promptflow Screenshot"
promptflow
**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.
data:image/s3,"s3://crabby-images/ab8b8/ab8b8cebd0341c74187b3d61aeb87e0f2fb2cdb3" alt="deepeval Screenshot"
deepeval
DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.
data:image/s3,"s3://crabby-images/e1c9c/e1c9cb6476b28bd2e7747bd8bb648f589e7a8a58" alt="MegaDetector Screenshot"
MegaDetector
MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".
data:image/s3,"s3://crabby-images/293f8/293f804c9c75f7eea066dbb9641a9e2a720352a9" alt="leapfrogai Screenshot"
leapfrogai
LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.
data:image/s3,"s3://crabby-images/e9e57/e9e57c48e1f1a24513c9f0787d43e28ff7e2f1e0" alt="llava-docker Screenshot"
llava-docker
This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.
data:image/s3,"s3://crabby-images/42ce0/42ce00b37a94142cfef613e1bd0b671a2b2ac93b" alt="carrot Screenshot"
carrot
The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.
data:image/s3,"s3://crabby-images/05dd1/05dd14da234de136a653943437543f3f64d17b13" alt="TrustLLM Screenshot"
TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.
data:image/s3,"s3://crabby-images/a2f2b/a2f2bf9f354435d8b89f863ff2d3666def187740" alt="AI-YinMei Screenshot"
AI-YinMei
AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.