aiavatarkit
๐ฅฐ Building AI-based conversational avatars lightning fast โก๏ธ๐ฌ
Stars: 154
AIAvatarKit is a tool for building AI-based conversational avatars quickly. It supports various platforms like VRChat and cluster, along with real-world devices. The tool is extensible, allowing unlimited capabilities based on user needs. It requires VOICEVOX API, Google or Azure Speech Services API keys, and Python 3.10. Users can start conversations out of the box and enjoy seamless interactions with the avatars.
README:
๐ฅฐ Building AI-based conversational avatars lightning fast โก๏ธ๐ฌ
- Live anywhere: VRChat, cluster and any other metaverse platforms, and even devices in real world.
- Extensible: Unlimited capabilities that depends on you.
- Easy to start: Ready to start conversation right out of the box.
- VOICEVOX API in your computer or network reachable machine (Text-to-Speech)
- API key for Speech Services of Google or Azure (Speech-to-Text)
- API key for OpenAI API (ChatGPT)
- Python 3.10 (Runtime)
Install AIAvatarKit.
$ pip install aiavatar
Make the script as run.py
.
from aiavatar import AIAvatar
app = AIAvatar(
openai_api_key="YOUR_OPENAI_API_KEY",
google_api_key="YOUR_GOOGLE_API_KEY"
)
app.start_listening_wakeword()
# # Tips: To terminate with Ctrl+C on Windows, use `while` to wait instead of `app.start_listening_wakeword()`
# app.start_listening_wakeword(False)
# while True:
# time.sleep(1)
Start AIAvatar. Also, don't forget to launch VOICEVOX beforehand.
$ python run.py
Conversation will start when you say the wake word "ใใใซใกใฏ" (or "Hello" when language is not ja-JP
).
Feel free to enjoy the conversation afterwards!
Here are the configuration for each component.
You can set model and system message content when instantiate AIAvatar
.
app = AIAvatar(
openai_api_key="YOUR_OPENAI_API_KEY",
google_api_key="YOUR_GOOGLE_API_KEY",
model="gpt-4-turbo",
system_message_content="You are my cat."
)
If you want to configure in detail, create instance of ChatGPTProcessor
with custom parameters and set it to AIAvatar
.
from aiavatar.processors.chatgpt import ChatGPTProcessor
chat_processor = ChatGPTProcessor(
api_key=OPENAI_API_KEY,
model="gpt-4-turbo",
temperature=0.0,
max_tokens=200,
system_message_content="You are my cat.",
history_count=20, # Count of messages included in request to ChatGPT as context
history_timeout=120.0 # Duration in seconds to expire histories
)
app.chat_processor = chat_processor
Create instance of ClaudeProcessor
with custom parameters and set it to AIAvatar
. The default model is claude-3-sonnet-20240229
.
from aiavatar.processors.claude import ClaudeProcessor
claude_processor = ClaudeProcessor(
api_key="ANTHROPIC_API_KEY"
)
app = AIAvatar(
google_api_key=GOOGLE_API_KEY,
chat_processor=claude_processor
)
NOTE: We support Claude 3 on Anthropic API, not Amazon Bedrock for now.
Create instance of GeminiProcessor
with custom parameters and set it to AIAvatar
. The default model is gemini-pro
.
from aiavatar.processors.gemini import GeminiProcessor
gemini_processor = GeminiProcessor(
api_key="YOUR_GOOGLE_API_KEY"
)
app = AIAvatar(
google_api_key=GOOGLE_API_KEY,
chat_processor=gemini_processor
)
NOTE: We support Gemini on Google AI Studio, not Vertex AI for now.
You can use the Dify API instead of a specific LLM's API. This eliminates the need to manage code for tools or RAG locally.
from aiavatar import AIAvatar
from aiavatar.processors.dify import DifyProcessor
chat_processor_dify = DifyProcessor(
api_key=DIFY_API_KEY,
user=DIFY_USER
)
app = AIAvatar(
google_api_key=GOOGLE_API_KEY,
chat_processor=chat_processor_dify
)
app.start_listening_wakeword()
You can make your custom processor that uses other generative AIs such as Llama3 by implementing ChatProcessor
interface. We provide the example later.๐
You can set speaker id and the base url for VOICEVOX server when instantiate AIAvatar
.
app = AIAvatar(
openai_api_key="YOUR_OPENAI_API_KEY",
google_api_key="YOUR_GOOGLE_API_KEY",
# 46 is Sayo. See http://127.0.0.1:50021/speakers to get all ids for characters
voicevox_speaker_id=46
)
If you want to configure in detail, create instance of VoicevoxSpeechController
with custom parameters and set it to AIAvatar
.
from aiavatar.speech.voicevox import VoicevoxSpeechController
speech_controller = VoicevoxSpeechController(
base_url="https",
speaker_id=46,
device_index=app.audio_devices.output_device
)
app.avatar_controller.speech_controller = speech_controller
Speech is handled in a separate subprocess to improve audio quality and reduce noises such as popping, caused by thread blocking during parallel processing of AI responses and speech output. For systems with limited resources, setting use_subprocess=False
allows speech processing within the main process, potentially reintroducing some noise.
app.avatar_controller.speech_controller = VoicevoxSpeechController(
base_url="http://127.0.0.1:50021",
speaker_id=46,
device_index=app.audio_devices.output_device,
use_subprocess=False # Set to False to handle speech in the main process
)
You can also set speech controller that uses alternative Text-to-Speech services. We provide AzureSpeechController
for now.
from aiavatar.speech.azurespeech import AzureSpeechController
AzureSpeechController(
AZURE_SUBSCRIPTION_KEY, AZURE_REGION,
device_index=app.audio_devices.output_device,
# # Set params if you want to customize
# speaker_name="en-US-AvaNeural",
# speaker_gender="Female",
# lang="en-US"
)
The default speaker is en-US-JennyMultilingualNeural
that support multi languages.
https://learn.microsoft.com/ja-jp/azure/ai-services/speech-service/language-support?tabs=tts
You can make custom speech controller by impelemting SpeechController
interface or extending SpeechControllerBase
.
Set wakewords when instantiate AIAvatar
. Conversation will start when AIAvatar recognizes the one of the words in this list.
app = AIAvatar(
openai_api_key=OPENAI_API_KEY,
google_api_key=GOOGLE_API_KEY,
wakewords=["Hello", "ใใใซใกใฏ"],
)
If you want to configure in detail, create instance of WakewordListener
with custom parameters and set it to AIAvatar
.
from aiavatar.listeners.wakeword import WakewordListener
wakeword_listener = WakewordListener(
api_key=GOOGLE_API_KEY,
wakewords=["Hello", "ใใใซใกใฏ"],
device_index=app.audio_devices.input_device,
timeout=0.2, # Duration in seconds to wait for silence before ending speech recognition
max_duration=1.5 # Maximum duration in seconds to recognize speech before stopping
)
app.wakeword_listener = wakeword_listener
If you want to configure in detail, create instance of VoiceRequestListener
with custom parameters and set it to AIAvatar
.
from aiavatar.listeners.voicerequest import VoiceRequestListener
request_listener = VoiceRequestListener(
api_key=GOOGLE_API_KEY,
device_index=app.audio_devices.input_device,,
detection_timeout=15.0, # Timeout in seconds to end the process if speech does not start within this duration
timeout=0.5, # Duration in seconds to wait for silence before ending speech recognition
max_duration=20.0, # Maximum duration in seconds to recognize speech before stopping
min_duration=0.2, # Minimum duration in seconds for speech to be recognized; shorter sounds are ignored
)
app.request_listener = request_listener
We strongly recommend using AzureWakewordListener and AzureRequestListner that are more stable than the default listners. Check examples/run_azure.py that works out-of-the-box.
Install Azure SpeechSDK.
$ pip install azure-cognitiveservices-speech
Change script to use AzureRequestListener and AzureWakewordListener.
from aiavatar.listeners.azurevoicerequest import AzureVoiceRequestListener
from aiavatar.listeners.azurewakeword import AzureWakewordListener
YOUR_SUBSCRIPTION_KEY = "YOUR_SUBSCRIPTION_KEY"
YOUR_REGION_NAME = "YOUR_REGION_NAME"
# Create AzureRequestListener
azure_request_listener = AzureVoiceRequestListener(
YOUR_SUBSCRIPTION_KEY,
YOUR_REGION_NAME
)
# Create AzureWakewordListner
async def on_wakeword(text):
logger.info(f"Wakeword: {text}")
await app.start_chat()
azrue_wakeword_listener = AzureWakewordListener(
YOUR_SUBSCRIPTION_KEY,
YOUR_REGION_NAME,
on_wakeword=on_wakeword,
wakewords=["ใใใซใกใฏ"]
)
# Create AIAVater with AzureRequestListener and Azure WakewordListener
app = AIAvatar(
openai_api_key=OPENAI_API_KEY,
request_listener=azure_request_listener,
wakeword_listener=azrue_wakeword_listener
)
To specify the microphone device by setting device_name
argument.
See Microsoft Learn to know how to check the device UID on each platform.
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-select-audio-input-devices
We provide a script for MacOS. Just run it on Xcode.
Device UID: BuiltInMicrophoneDevice, Name: MacBook Proใฎใใคใฏ
Device UID: com.vbaudio.vbcableA:XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX, Name: VB-Cable A
Device UID: com.vbaudio.vbcableB:XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX, Name: VB-Cable B
For example, the UID for the built-in microphone on MacOS is BuiltInMicrophoneDevice
.
Then, set it as the value of device_name
.
azure_request_listener = AzureVoiceRequestListener(
YOUR_SUBSCRIPTION_KEY,
YOUR_REGION_NAME,
device_name="BuiltInMicrophoneDevice"
)
azure_wakeword_listener = AzureWakewordListener(
YOUR_SUBSCRIPTION_KEY,
YOUR_REGION_NAME,
on_wakeword=on_wakeword,
wakewords=["Hello", "ใใใซใกใฏ"],
device_name="BuiltInMicrophoneDevice"
)
OpenAI's Speech-to-Text and Text-to-Speech capabilities provide dynamic speech recognition and voice output across multiple languages, without the need for fixed language settings.
from aiavatar import AIAvatar
from aiavatar.device import AudioDevice
from aiavatar.listeners.openailisteners import (
OpenAIWakewordListener,
OpenAIVoiceRequestListener
)
from aiavatar.speech.openaispeech import OpenAISpeechController
# Get default audio devices
devices = AudioDevice()
# Speech
speech_controller = OpenAISpeechController(
api_key=OPENAI_API_KEY,
device_index=devices.output_device
)
# Wakeword
async def on_wakeword(text):
await app.start_chat(request_on_start=text, skip_start_voice=True)
wakeword_listener = OpenAIWakewordListener(
api_key=OPENAI_API_KEY,
device_index=devices.input_device,
wakewords=["ใใใซใกใฏ"],
on_wakeword=on_wakeword
)
# Request
request_listener = OpenAIVoiceRequestListener(
api_key=OPENAI_API_KEY,
device_index=devices.input_device
)
# Create AIAvatar with OpenAI Components
app = AIAvatar(
openai_api_key=OPENAI_API_KEY,
wakeword_listener=wakeword_listener,
request_listener=request_listener,
speech_controller=speech_controller,
noise_margin=10.0,
verbose=True
)
app.start_listening_wakeword()
You can specify the audio devices to be used in components by name or index.
from aiavatar.device import AudioDevice
# Get devices by name or index
audio_device = AudioDevice(
input_device="ใใคใฏ",
output_device="ในใใผใซใผ"
)
Set device to components.
# Set output device to SpeechController
speech_controller = VoicevoxSpeechControllerSubProcess(
device_index=audio_device.output_device,
base_url="http://127.0.0.1:50021",
speaker_id=46,
)
# Set input device to Listeners
request_listener = VoiceRequestListener(
device_index=audio_device.input_device
)
wakeword_listener = WakewordListener(
device_index=audio_device.input_device,
wakewords=["Hello", "ใใใซใกใฏ"]
)
# Set components to AIAvatar
app = AIAvatar(
openai_api_key=OPENAI_API_KEY,
speech_controller=speech_controller,
request_listener=request_listener,
wakeword_listener=wakeword_listener
)
To control facial expressions within conversations, set the facial expression names and values in FaceController.faces
as shown below, and then include these expression keys in the response message by adding instructions to the prompt.
app.avatar_controller.face_controller.faces = {
"neutral": "๐",
"joy": "๐",
"angry": "๐ ",
"sorrow": "๐",
"fun": "๐ฅณ"
}
app.chat_processor.system_message_content = """# Face Expression
* You have the following expressions:
- joy
- angry
- sorrow
- fun
* If you want to express a particular emotion, please insert it at the beginning of the sentence like [face:joy].
Example
[face:joy]Hey, you can see the ocean! [face:fun]Let's go swimming.
"""
This allows emojis like ๐ฅณ to be autonomously displayed in the terminal during conversations. To actually control the avatar's facial expressions in a metaverse platform, instead of displaying emojis like ๐ฅณ, you will need to use custom implementations tailored to the integration mechanisms of each platform. Please refer to our VRChatFaceController
as an example.
Now writing... โ๏ธ
AIAvatarKit captures and sends image to AI dynamically when the AI determine that vision is required to process the request from the user. This gives "eyes" to your AIAvatar in metaverse platforms like VRChat.
To use vision, instruct vision tag in the system message and ChatGPTProcessor.get_image
.
import io
import pyautogui # pip install pyautogui
from aiavatar.processors.chatgpt import ChatGPTProcessor
from aiavatar.device.video import VideoDevice # pip install opencv-python
# Instruct vision tag in the system message
system_message_content = """
### Using Vision
If you need an image to process a user's request, you can obtain it using the following methods:
- screenshot
- camera
If an image is needed to process the request, add an instruction like [vision:screenshot] to your response to request an image from the user.
By adding this instruction, the user will provide an image in their next utterance. No comments about the image itself are necessary.
Example:
user: Look! This is the sushi I had today.
assistant: [vision:screenshot] Let me take a look.
"""
# Implement get_image
default_camera = VideoDevice(device_index=0, width=960, height=540)
async def get_image(source: str=None) -> bytes:
if source == "camera":
return await default_camera.capture_image("camera.jpg") # Save current image for debug
else:
buffered = io.BytesIO()
image = pyautogui.screenshot(region=(0, 0, 1280, 720))
image.save(buffered, format="PNG")
image.save("screenshot.png") # Save current image for debug
return buffered.getvalue()
# Configure ChatGPTProcessor
chat_processor = ChatGPTProcessor(
api_key=OPENAI_API_KEY,
model="gpt-4o",
system_message_content=system_message_content,
use_vision = True
)
chat_processor.get_image = get_image
NOTE
- Only the latest image will be sent to ChatGPT to avoid performance issues.
- Gemini and Claude can also use vision in the same way. Simply replace
ChatGPTProcessor
withClaudeProcessor
orGeminiProcessor
.
##ใ ๐ญ Custom Behavior
You can invoke custom implementations when listening to requests from user, processing those requests, or when recognized a wake word to start conversation.
In the following example, changing face expressions at each timing aims to enhance the interaction experience with the AI avatar.
# Set face when the character is listening the users voice
async def set_listening_face():
await app.avatar_controller.face_controller.set_face("listening", 3.0)
app.request_listener.on_start_listening = set_listening_face
# Set face when the character is processing the request
async def set_thinking_face():
await app.avatar_controller.face_controller.set_face("thinking", 3.0)
app.chat_processor.on_start_processing = set_thinking_face
async def on_wakeword(text):
logger.info(f"Wakeword: {text}")
# Set face when wakeword detected
await app.avatar_controller.face_controller.set_face("smile", 2.0)
await app.start_chat(request_on_start=text, skip_start_voice=True)
AIAvatarKit is capable of operating on any platform that allows applications to hook into audio input and output. The platforms that have been tested include:
- VRChat
- cluster
- Vket Cloud
In addition to running on PCs to operate AI avatars on these platforms, you can also create a communication robot by connecting speakers, a microphone, and, if possible, a display to a Raspberry Pi.
- 2 Virtual audio devices (e.g. VB-CABLE) are required.
- Multiple VRChat accounts are required to chat with your AIAvatar.
First, run the commands below in python interpreter to check the audio devices.
$ % python
>>> from aiavatar import AudioDevice
>>> AudioDevice.list_audio_devices()
Available audio devices:
0: Headset Microphone (Oculus Virt
:
6: CABLE-B Output (VB-Audio Cable
7: Microsoft ใตใฆใณใ ใใใใผ - Output
8: SONY TV (NVIDIA High Definition
:
13: CABLE-A Input (VB-Audio Cable A
:
In this example,
- To use
VB-Cable-A
for microphone for VRChat, index foroutput_device
is13
(CABLE-A Input). - To use
VB-Cable-B
for speaker for VRChat, index forinput_device
is6
(CABLE-B Output). Don't forget to setVB-Cable-B Input
as the default output device of Windows OS.
Then edit run.py
like below.
# Create AIAvatar
app = AIAvatar(
GOOGLE_API_KEY,
OPENAI_API_KEY,
model="gpt-3.5-turbo",
system_message_content=system_message_content,
input_device=6 # Listen sound from VRChat
output_device=13, # Speak to VRChat microphone
)
You can also set the name of audio devices instead of index (partial match, ignore case).
input_device="CABLE-B Out" # Listen sound from VRChat
output_device="cable-a input", # Speak to VRChat microphone
Run it.
$ run.py
Launch VRChat as desktop mode on the machine that runs run.py
and log in with the account for AIAvatar. Then set VB-Cable-A
to microphone in VRChat setting window.
That's all! Let's chat with the AIAvatar. Log in to VRChat on another machine (or Quest) and go to the world the AIAvatar is in.
AIAvatarKit controls the face expression by Avatar OSC.
LLM(ChatGPT/Claude/Gemini)
โ response with face tag [face:joy]Hello!
AIAvatarKit(VRCFaceExpressionController)
โ osc FaceOSC=1
VRChat(FX AnimatorController)
โ
๐
So at first, setup your avatar the following steps:
- Add avatar parameter
FaceOSC
(type: int, default value: 0, saved: false, synced: true). - Add
FaceOSC
parameter to the FX animator controller. - Add layer and put states and transitions for face expression to the FX animator controller.
- (option) If you use the avatar that is already used in VRChat, add input parameter configuration to avatar json.
Next, use VRChatFaceController
.
from aiavatar.face.vrchat import VRChatFaceController
# Setup VRChatFaceContorller
vrc_face_controller = VRChatFaceController(
faces={
"neutral": 0, # always set `neutral: 0`
# key = the name that LLM can understand the expression
# value = FaceOSC value that is set to the transition on the FX animator controller
"joy": 1,
"angry": 2,
"sorrow": 3,
"fun": 4
}
)
Lastly, add face expression section to the system prompt.
# Make system prompt
system_message_content = """
# Face Expression
* You have following expressions:
- joy
- angry
- sorrow
- fun
* If you want to express a particular emotion, please insert it at the beginning of the sentence like [face:joy].
Example
[face:joy]Hey, you can see the ocean! [face:fun]Let's go swimming.
"""
# Set them to AIAvatar
app = AIAvatar(
openai_api_key=OPENAI_API_KEY,
google_api_key=GOOGLE_API_KEY,
face_controller=vrc_face_controller,
system_message_content=system_message_content
)
You can test it not only through the voice conversation but also via the REST API.
Now writing... โ๏ธ
You can control AIAvatar via RESTful APIs. The provided functions are:
-
WakewordLister
- start: Start WakewordListener
- stop: Stop WakewordListener
- status: Show status of WakewordListener
-
Avatar
- speech: Speak text with face expression and animation
- face: Set face expression
- animation: Set animation
-
System
- log: Show recent logs
To use REST APIs, create API app and set router instead of calling app.start_listening_wakeword()
.
from fastapi import FastAPI
from aiavatar import AIAvatar
from aiavatar.api.router import get_router
app = AIAvatar(
openai_api_key=OPENAI_API_KEY,
google_api_key=GOOGLE_API_KEY
)
# app.start_listening_wakeword()
# Create API app and set router
api = FastAPI()
api_router = get_router(app, "aiavatar.log")
api.include_router(api_router)
Start API with uvicorn.
$ uvicorn run:api
Call /wakeword/start
to start wakeword listener.
$ curl -X 'POST' \
'http://127.0.0.1:8000/wakeword/start' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"wakewords": []
}'
See API spec and try it on http://127.0.0.1:8000/docs .
NOTE: AzureWakewordListeners stops immediately but the default WakewordListener stops after it recognizes wakeword.
Advanced usases.
Use chat_processor.add_function
to use ChatGPT function calling. In this example, get_weather
will be called autonomously.
# Add function
async def get_weather(location: str):
await asyncio.sleep(1.0)
return {"weather": "sunny partly cloudy", "temperature": 23.4}
app.chat_processor.add_function(
name="get_weather",
description="Get the current weather in a given location",
parameters={
"type": "object",
"properties": {
"location": {
"type": "string"
}
}
},
func=get_weather
)
And, after get_weather
called, message to get voice response will be sent to ChatGPT internally.
{
"role": "function",
"content": "{\"weather\": \"sunny partly cloudy\", \"temperature\": 23.4}",
"name": "get_weather"
}
Useful information for developping and debugging.
Using the script below to test the audio I/O before configuring AIAvatar.
- Step-by-Step audio device configuration.
- Speak immediately after start if the output device is correctly configured.
- All recognized text will be shown in console if the input device is correctly configured.
- Just echo on wakeword recognized.
import asyncio
import logging
from aiavatar import (
AudioDevice,
VoicevoxSpeechController,
WakewordListener
)
GOOGLE_API_KEY = "YOUR_API_KEY"
VV_URL = "http://127.0.0.1:50021"
VV_SPEAKER = 46
INPUT_DEVICE = -1
OUTPUT_DEVICE = -1
# Configure root logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)
log_format = logging.Formatter("[%(levelname)s] %(asctime)s : %(message)s")
streamHandler = logging.StreamHandler()
streamHandler.setFormatter(log_format)
logger.addHandler(streamHandler)
# Select input device
if INPUT_DEVICE < 0:
input_device_info = AudioDevice.get_input_device_with_prompt()
else:
input_device_info = AudioDevice.get_device_info(INPUT_DEVICE)
input_device = input_device_info["index"]
# Select output device
if OUTPUT_DEVICE < 0:
output_device_info = AudioDevice.get_output_device_with_prompt()
else:
output_device_info = AudioDevice.get_device_info(OUTPUT_DEVICE)
output_device = output_device_info["index"]
logger.info(f"Input device: [{input_device}] {input_device_info['name']}")
logger.info(f"Output device: [{output_device}] {output_device_info['name']}")
# Create speaker
speaker = VoicevoxSpeechController(
VV_URL,
VV_SPEAKER,
device_index=output_device
)
asyncio.run(speaker.speak("ใชใผใใฃใชใใใคในใฎใในใฟใผใ่ตทๅใใพใใใ็งใฎๅฃฐใ่ใใใฆใใพใใ๏ผ"))
# Create WakewordListener
wakewords = ["ใใใซใกใฏ"]
async def on_wakeword(text):
logger.info(f"Wakeword: {text}")
await speaker.speak(f"{text}")
wakeword_listener = WakewordListener(
api_key=GOOGLE_API_KEY,
wakewords=["ใใใซใกใฏ"],
on_wakeword=on_wakeword,
verbose=True,
device_index=input_device
)
# Start listening
ww_thread = wakeword_listener.start()
ww_thread.join()
AIAvatarKit automatically adjusts the noise filter for listeners when you instantiate an AIAvatar object. To manually set the noise filter level for voice detection, set auto_noise_filter_threshold
to False
and specify the volume_threshold_db
in decibels (dB).
app = AIAvatar(
openai_api_key=OPENAI_API_KEY,
google_api_key=GOOGLE_API_KEY,
auto_noise_filter_threshold=False,
volume_threshold_db=-40 # Set the voice detection threshold to -40 dB
)
Use ChatGPTProcessor with some arguments.
- base_url: URL for LM Studio local server
- model: Name of model
- parse_function_call_in_response: Always set
False
from aiavatar import AIAvatar
from aiavatar.processors.chatgpt import ChatGPTProcessor
chat_processor = ChatGPTProcessor(
api_key=OPENAI_API_KEY,
base_url="http://127.0.0.1:1234/v1",
model="mmnga/DataPilot-ArrowPro-7B-KUJIRA-gguf",
parse_function_call_in_response=False
)
app = AIAvatar(
google_api_key=GOOGLE_API_KEY,
chat_processor=chat_processor
)
app.start_listening_wakeword()
It's very easy to add your original listeners. Just make it run on other thread and invoke app.start_chat()
when the listener handles the event.
Here the example of FileSystemListener
that invokes chat when test.txt
is found on the file system.
import asyncio
import os
from threading import Thread
from time import sleep
class FileSystemListener:
def __init__(self, on_file_found):
self.on_file_found = on_file_found
def start_listening(self):
while True:
# Check file every 3 seconds
if os.path.isfile("test.txt"):
asyncio.run(self.on_file_found())
sleep(3)
def start(self):
th = Thread(target=self.start_listening, daemon=True)
th.start()
return th
Use this listener in run.py
like below.
# Event handler
def on_file_found():
asyncio.run(app.chat())
# Instantiate
fs_listener = FileSystemListener(on_file_found)
fs_thread = fs_listener.start()
:
# Wait for finish
fs_thread.join()
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for aiavatarkit
Similar Open Source Tools
aiavatarkit
AIAvatarKit is a tool for building AI-based conversational avatars quickly. It supports various platforms like VRChat and cluster, along with real-world devices. The tool is extensible, allowing unlimited capabilities based on user needs. It requires VOICEVOX API, Google or Azure Speech Services API keys, and Python 3.10. Users can start conversations out of the box and enjoy seamless interactions with the avatars.
instructor
Instructor is a Python library that makes it a breeze to work with structured outputs from large language models (LLMs). Built on top of Pydantic, it provides a simple, transparent, and user-friendly API to manage validation, retries, and streaming responses. Get ready to supercharge your LLM workflows!
ai21-python
The AI21 Labs Python SDK is a comprehensive tool for interacting with the AI21 API. It provides functionalities for chat completions, conversational RAG, token counting, error handling, and support for various cloud providers like AWS, Azure, and Vertex. The SDK offers both synchronous and asynchronous usage, along with detailed examples and documentation. Users can quickly get started with the SDK to leverage AI21's powerful models for various natural language processing tasks.
client-python
The Mistral Python Client is a tool inspired by cohere-python that allows users to interact with the Mistral AI API. It provides functionalities to access and utilize the AI capabilities offered by Mistral. Users can easily install the client using pip and manage dependencies using poetry. The client includes examples demonstrating how to use the API for various tasks, such as chat interactions. To get started, users need to obtain a Mistral API Key and set it as an environment variable. Overall, the Mistral Python Client simplifies the integration of Mistral AI services into Python applications.
python-tgpt
Python-tgpt is a Python package that enables seamless interaction with over 45 free LLM providers without requiring an API key. It also provides image generation capabilities. The name _python-tgpt_ draws inspiration from its parent project tgpt, which operates on Golang. Through this Python adaptation, users can effortlessly engage with a number of free LLMs available, fostering a smoother AI interaction experience.
syncode
SynCode is a novel framework for the grammar-guided generation of Large Language Models (LLMs) that ensures syntactically valid output with respect to defined Context-Free Grammar (CFG) rules. It supports general-purpose programming languages like Python, Go, SQL, JSON, and more, allowing users to define custom grammars using EBNF syntax. The tool compares favorably to other constrained decoders and offers features like fast grammar-guided generation, compatibility with HuggingFace Language Models, and the ability to work with various decoding strategies.
phidata
Phidata is a framework for building AI Assistants with memory, knowledge, and tools. It enables LLMs to have long-term conversations by storing chat history in a database, provides them with business context by storing information in a vector database, and enables them to take actions like pulling data from an API, sending emails, or querying a database. Memory and knowledge make LLMs smarter, while tools make them autonomous.
Webscout
WebScout is a versatile tool that allows users to search for anything using Google, DuckDuckGo, and phind.com. It contains AI models, can transcribe YouTube videos, generate temporary email and phone numbers, has TTS support, webai (terminal GPT and open interpreter), and offline LLMs. It also supports features like weather forecasting, YT video downloading, temp mail and number generation, text-to-speech, advanced web searches, and more.
lmstudio.js
lmstudio.js is a pre-release alpha client SDK for LM Studio, allowing users to use local LLMs in JS/TS/Node. It is currently undergoing rapid development with breaking changes expected. Users can follow LM Studio's announcements on Twitter and Discord. The SDK provides API usage for loading models, predicting text, setting up the local LLM server, and more. It supports features like custom loading progress tracking, model unloading, structured output prediction, and cancellation of predictions. Users can interact with LM Studio through the CLI tool 'lms' and perform tasks like text completion, conversation, and getting prediction statistics.
swarmzero
SwarmZero SDK is a library that simplifies the creation and execution of AI Agents and Swarms of Agents. It supports various LLM Providers such as OpenAI, Azure OpenAI, Anthropic, MistralAI, Gemini, Nebius, and Ollama. Users can easily install the library using pip or poetry, set up the environment and configuration, create and run Agents, collaborate with Swarms, add tools for complex tasks, and utilize retriever tools for semantic information retrieval. Sample prompts are provided to help users explore the capabilities of the agents and swarms. The SDK also includes detailed examples and documentation for reference.
aicsimageio
AICSImageIO is a Python tool for Image Reading, Metadata Conversion, and Image Writing for Microscopy Images. It supports various file formats like OME-TIFF, TIFF, ND2, DV, CZI, LIF, PNG, GIF, and Bio-Formats. Users can read and write metadata and imaging data, work with different file systems like local paths, HTTP URLs, s3fs, and gcsfs. The tool provides functionalities for full image reading, delayed image reading, mosaic image reading, metadata reading, xarray coordinate plane attachment, cloud IO support, and saving to OME-TIFF. It also offers benchmarking and developer resources.
langchainrb
Langchain.rb is a Ruby library that makes it easy to build LLM-powered applications. It provides a unified interface to a variety of LLMs, vector search databases, and other tools, making it easy to build and deploy RAG (Retrieval Augmented Generation) systems and assistants. Langchain.rb is open source and available under the MIT License.
instructor
Instructor is a popular Python library for managing structured outputs from large language models (LLMs). It offers a user-friendly API for validation, retries, and streaming responses. With support for various LLM providers and multiple languages, Instructor simplifies working with LLM outputs. The library includes features like response models, retry management, validation, streaming support, and flexible backends. It also provides hooks for logging and monitoring LLM interactions, and supports integration with Anthropic, Cohere, Gemini, Litellm, and Google AI models. Instructor facilitates tasks such as extracting user data from natural language, creating fine-tuned models, managing uploaded files, and monitoring usage of OpenAI models.
vision-parse
Vision Parse is a tool that leverages Vision Language Models to parse PDF documents into beautifully formatted markdown content. It offers smart content extraction, content formatting, multi-LLM support, PDF document support, and local model hosting using Ollama. Users can easily convert PDFs to markdown with high precision and preserve document hierarchy and styling. The tool supports multiple Vision LLM providers like OpenAI, LLama, and Gemini for accuracy and speed, making document processing efficient and effortless.
cursive-py
Cursive is a universal and intuitive framework for interacting with LLMs. It is extensible, allowing users to hook into any part of a completion life cycle. Users can easily describe functions that LLMs can use with any supported model. Cursive aims to bridge capabilities between different models, providing a single interface for users to choose any model. It comes with built-in token usage and costs calculations, automatic retry, and model expanding features. Users can define and describe functions, generate Pydantic BaseModels, hook into completion life cycle, create embeddings, and configure retry and model expanding behavior. Cursive supports various models from OpenAI, Anthropic, OpenRouter, Cohere, and Replicate, with options to pass API keys for authentication.
shortest
Shortest is an AI-powered natural language end-to-end testing framework built on Playwright. It provides a seamless testing experience by allowing users to write tests in natural language and execute them using Anthropic Claude API. The framework also offers GitHub integration with 2FA support, making it suitable for testing web applications with complex authentication flows. Shortest simplifies the testing process by enabling users to run tests locally or in CI/CD pipelines, ensuring the reliability and efficiency of web applications.
For similar tasks
aiavatarkit
AIAvatarKit is a tool for building AI-based conversational avatars quickly. It supports various platforms like VRChat and cluster, along with real-world devices. The tool is extensible, allowing unlimited capabilities based on user needs. It requires VOICEVOX API, Google or Azure Speech Services API keys, and Python 3.10. Users can start conversations out of the box and enjoy seamless interactions with the avatars.
discollama
Discollama is a Discord bot powered by a local large language model backed by Ollama. It allows users to interact with the bot in Discord by mentioning it in a message to start a new conversation or in a reply to a previous response to continue an ongoing conversation. The bot requires Docker and Docker Compose to run, and users need to set up a Discord Bot and environment variable DISCORD_TOKEN before using discollama.py. Additionally, an Ollama server is needed, and users can customize the bot's personality by creating a custom model using Modelfile and running 'ollama create'.
Muice-Chatbot
Muice-Chatbot is an AI chatbot designed to proactively engage in conversations with users. It is based on the ChatGLM2-6B and Qwen-7B models, with a training dataset of 1.8K+ dialogues. The chatbot has a speaking style similar to a 2D girl, being somewhat tsundere but willing to share daily life details and greet users differently every day. It provides various functionalities, including initiating chats and offering 5 available commands. The project supports model loading through different methods and provides onebot service support for QQ users. Users can interact with the chatbot by running the main.py file in the project directory.
TerminalGPT
TerminalGPT is a terminal-based ChatGPT personal assistant app that allows users to interact with OpenAI GPT-3.5 and GPT-4 language models. It offers advantages over browser-based apps, such as continuous availability, faster replies, and tailored answers. Users can use TerminalGPT in their IDE terminal, ensuring seamless integration with their workflow. The tool prioritizes user privacy by not using conversation data for model training and storing conversations locally on the user's machine.
ESP32_AI_LLM
ESP32_AI_LLM is a project that uses ESP32 to connect to Xunfei Xinghuo, Dou Bao, and Tongyi Qianwen large models to achieve voice chat functions, supporting online voice wake-up, continuous conversation, music playback, and real-time display of conversation content on an external screen. The project requires specific hardware components and provides functionalities such as voice wake-up, voice conversation, convenient network configuration, music playback, volume adjustment, LED control, model switching, and screen display. Users can deploy the project by setting up Xunfei services, cloning the repository, configuring necessary parameters, installing drivers, compiling, and burning the code.
gemini-multimodal-playground
Gemini Multimodal Playground is a basic Python app for voice conversations with Google's Gemini 2.0 AI model. It features real-time voice input and text-to-speech responses. Users can configure settings through the GUI and interact with Gemini by speaking into the microphone. The application provides options for voice selection, system prompt customization, and enabling Google search. Troubleshooting tips are available for handling audio feedback loop issues that may occur during interactions.
alexa-skill-llm-intent
An Alexa Skill template that provides a ready-to-use skill for starting a conversation with an AI. Users can ask questions and receive answers in Alexa's voice, powered by ChatGPT or other llm. The template includes setup instructions for configuring the AI provider API and model, as well as usage commands for interacting with the skill. It serves as a starting point for creating custom Alexa Skills and should be used at the user's own risk.
For similar jobs
promptflow
**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.
deepeval
DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.
MegaDetector
MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".
leapfrogai
LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.
llava-docker
This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.
carrot
The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.
TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.
AI-YinMei
AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.