AlwaysReddy
AlwaysReddy is a LLM voice assistant that is always just a hotkey away.
Stars: 627
AlwaysReddy is a simple LLM assistant with no UI that you interact with entirely using hotkeys. It can easily read from or write to your clipboard, and voice chat with you via TTS and STT. Here are some of the things you can use AlwaysReddy for: - Explain a new concept to AlwaysReddy and have it save the concept (in roughly your words) into a note. - Ask AlwaysReddy "What is X called?" when you know how to roughly describe something but can't remember what it is called. - Have AlwaysReddy proofread the text in your clipboard before you send it. - Ask AlwaysReddy "From the comments in my clipboard, what do the r/LocalLLaMA users think of X?" - Quickly list what you have done today and get AlwaysReddy to write a journal entry to your clipboard before you shutdown the computer for the day.
README:
Hey, I'm Josh, the creator of AlwaysReddy. I am still a little bit of a noob when it comes to programming and I'm really trying to develop my skills over the next year, I'm treating this project as an attempt to better develop my skills, with that in mind I would really appreciate it if you could point out issues and bad practices in my code (of which I'm sure there will be plenty). I would also appreciate if you would make your own improvements to the project so I can learn from your changes. Twitter: https://twitter.com/MindofMachine1
Contact me: [email protected]
I'm looking for work, if you know of anyone needing a skillset like mine, please let me know! :)
The code base is a mess right now, I am in the middle of transforming AlwaysReddy from just being a voice chat bot into something that will allow users to create their own chatbots and extensions. This transition will be a little messy until I find solutions that I like, then I will start cleaning things up.
- Meet AlwaysReddy
- Philosophy of the project
- Setup
- Known Issues
- How to add custom actions
- Troubleshooting
- How to
- Supported LLM servers
- Supported TTS systems
AlwaysReddy is a simple LLM assistant with the perfect amount of UI... None! You interact with it entirely using hotkeys, it can easily read from or write to your clipboard. It's like having voice ChatGPT running on your computer at all times, you just press a hotkey and it will listen to any questions you have, no need to swap windows or tabs, and if you want to give it context of some extra text, just copy the text and double tap the hotkey!
Future of AlwaysReddy I would like to make AlwaysReddy an extensible interface where you can easily voice chat with a range of AIs, these AIs could be given the ability to access custom tools or applications so that they can do tasks for you on the fly, all of this with as little friction as possible.
Pull Requests Welcome!
Join the discord: https://discord.gg/su44drSBzb
Here is a demo video of me using it with Llama3 https://www.reddit.com/r/LocalLLaMA/comments/1ca510h/voice_chatting_with_llama_3_8b/
- Friction is the enemy
- I am building this for myself first but sharing it in case other people get value from it too
- Practicality first, I want this system to help me be as effective as possible
- I will change directions freely, when I think of a more useful direction for the code base I will start working in that direction, even if that makes things messy in the short term
- Help is always welcome, if you have an idea of how you could improve AlwaysReddy, jump in and get your hands dirty!
You interact with AlwaysReddy entirely with hotkeys, it has the ability to:
- Voice chat with you via TTS and STT
- Read from your clipboard (with
Ctrl + Alt + R + R
rapidly double tapping R). - Write text to your clipboard on request.
- Can be run 100% locally!!!
- Supports Windows, Mac (experimental), linux (super duper experimental, see Known Issues)
- You can create your own hotkeys to fire custom code using AlwaysReddy's inbuilt systems like TTS and STT
If you are and you're willing to help please consider look at the Known Issues, I'm pretty stuck here!
I often use AlwaysReddy for the following things:
- When I have just learned a new concept I will often explain the concept aloud to AlwaysReddy and have it save the concept (in roughly my words) into a note.
- "What is X called?" Often I know how to roughly describe something but cant remember what it is called, AlwaysReddy is handy for quickly giving me the answer without me having to open the browser.
- "Can you proof read the text in my clipboard before I send it?"
- "What do the r/LocalLLaMA users think of X, based on the comments in my clipboard?"
- Quick journal entries, I speedily list what I have done today and get it to write a journal entry to my clipboard before I shutdown the computer for the day.
- OpenAI
- Anthropic
- TogetherAI
- LM Studio (local) - Setup Guide
- Ollama (local) - Setup Guide
- Perplexity
- TabbyAPI (local)
- Piper TTS (local and fast) See how to change voice model
- OpenAI TTS API
- Default mac TTS
GPU Setup Instructions
To use GPU acceleration with the faster-whisper API, follow these steps:
-
Check if CUDA is already installed:
- Open a terminal or command prompt.
- Run the following command:
nvcc --version
- If CUDA is installed, you should see output similar to:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Sun_Feb_14_21:12:58_PST_2021 Cuda compilation tools, release 11.2, V11.2.152 Build cuda_11.2.r11.2/compiler.29618528_0
- Note down the CUDA version (e.g., 11.2 in the example above).
-
If CUDA is not installed or you want to install a different version:
- Visit the official NVIDIA CUDA Toolkit website: CUDA Toolkit
- Download and install the appropriate CUDA Toolkit version for your system.
-
Install PyTorch with CUDA support based on your system and CUDA version. Follow the instructions on the official PyTorch website: PyTorch Installation
Example command for CUDA 11.6:
pip install torch==1.12.0+cu116 -f https://download.pytorch.org/whl/torch_stable.html
-
In the
config.py
file, setUSE_GPU = True
to enable GPU acceleration.
Note for MacOS: it is expected that you have Brew installed on your system, look here for setup
- Clone this repo with
git clone https://github.com/ILikeAI/AlwaysReddy
- Navigate into the directory:
cd AlwaysReddy
- Run the setup script with
python setup.py
on windows orpython3 setup.py
on mac and linux. - Open the
config.py
and.env
files and update them with your settings and API keys.
If you get module 'requests' not found
run pip install requests
or pip3 install requests
If you encounter any issues during the setup process, please refer to the Troubleshooting section below.
- Double-click on the
run_AlwaysReddy.bat
file created during the setup process.
OR run python main.py
from the command prompt or terminal.
- Activate the venv
venv\Scripts\activate
then run the main script directlypython main.py
.
- Open a terminal, navigate to the AlwaysReddy directory, and run
./run_AlwaysReddy.sh
.
OR run python3 main.py
from the command prompt or terminal.
- Activate the venv
source venv/bin/activate
then run the main script directlypython3 main.py
.
- On linux it only detects hotkey presses when the application is in foucs, this is a major issue as the whole point of the project is to have it run in the background, if you want to help out this would be a great place to start poking around! -- this may only be an issue with systems using wayland
- Using AlwaysReddy in the terminal on ubuntu does not work for me, when I press the hotkey it just prints the key in the terminal, running it in my IDE works.
If you have issues try deleting the venv folder and starting again. Set VERBOSE = True in the config to get more detailed logs and error traces
There are currently only main 2 actions:
Voice chat:
- Press
Ctrl + Alt + R
to start dictating, you can talk for as long as you want, then pressCtrl + Alt + R
again to stop recording, a few seconds later you will get a voice response from the AI - You can also hold
Ctrl + Alt + R
to record and release it when you're done to get the transcription.
Voice chat with context of your clipboard:
- Double tap
Ctrl + Alt + R
(or just holdCtrl + Alt
and quickly pressR
Twice) This will give the AI the content of your clipboard so you can ask it to reference it, rewrite it, answer questions from its contents... whatever you like! - Clear the assistants memory with
Ctrl + Alt + W
. - Cancel recording or TTS with
Ctrl + Alt + E
Get AlwaysReddy to output to your clipboard:
- Just ask it to! It is prompted to know how to save text to the clipboard instead of speaking it aloud.
Please let me know if you think of better hotkey defaults!
All hotkeys can be edited in config.py
- Go to https://huggingface.co/rhasspy/piper-voices/tree/main and navigate to your desired language.
- Click on the name of the voice you want to try. There are different sized models available; I suggest using the medium size as it's pretty fast but still sounds great (for a locally run model).
- Listen to the sample in the "sample" folder to ensure you like the voice.
- Download the
.onnx
and.json
files for the chosen voice. - Create a new folder in the
piper_tts\voices
directory and give it a descriptive name. You will need to enter the name of this folder into theconfig.py
file. For example:PIPER_VOICE = "default_female_voice"
. - Move the two downloaded files (
.onnx
and.json
) into your newly created folder within thepiper_tts\voices
directory.
- Open the
config.py
file. - Locate the "Transcription API Settings" section.
- Comment out the line
TRANSCRIPTION_API = "openai"
by adding a#
at the beginning of the line. - Uncomment the line
TRANSCRIPTION_API = "faster-whisper"
by removing the#
at the beginning of the line. - Adjust the
WHISPER_MODEL
andTRANSCRIPTION_LANGUAGE
settings according to your preferences. - Save the
config.py
file.
Available models with faster-whisper: tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large, distil-large-v2, distil-medium.en, distil-small.en, distil-large-v3
Here's an example of how your config.py
file should look like for local whisper transcription:
### Transcription API Settings ###
## OPENAI API TRANSCRIPTION EXAMPLE ##
# TRANSCRIPTION_API = "openai" # this will use the hosted openai api
## Faster Whisper local transcription ###
TRANSCRIPTION_API = "FasterWhisper" # this will use the local whisper model
# Supported models:
WHISPER_MODEL = "tiny.en" # If you prefer not to use english set it to "tiny", if the transcription quality is too low then set it to "base" but this will be a little slower
Note: The default whisper model is english only, try setting WHISPER_MODEL to 'tiny' or 'base' for other languages
To swap models open the config.py file and uncomment the sections for the API you want to use. For example this is how you would use Claude 3 sonnet, if you wanted to use LM studio you would comment out the Anthropic section and uncomment the LM studio section.
### COMPLETIONS API SETTINGS ###
## LM Studio COMPLETIONS API EXAMPLE ##
# COMPLETIONS_API = "lm_studio"
# COMPLETION_MODEL = "local-model" #This stays as local-model no matter what model you are using
## ANTHROPIC COMPLETIONS API EXAMPLE ##
COMPLETIONS_API = "anthropic"
COMPLETION_MODEL = "claude-3-sonnet-20240229"
## TOGETHER COMPLETIONS API EXAMPLE ##
# COMPLETIONS_API = "together"
# COMPLETION_MODEL = "NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT"
## OPENAI COMPLETIONS API EXAMPLE ##
# COMPLETIONS_API = "openai"
# COMPLETION_MODEL = "gpt-4-0125-preview"
To use local TTS just open the config file and set TTS_ENGINE="piper"
- Navigate to the system_prompts directory.
- Make a copy of an existing prompt file.
- Open the copy in a text or code editor and edit the prompt inside the two
'''
as you like. - Edit your config.py file by setting the
ACTIVE_PROMPT
option to the name of your new prompt file (without the .py extension) as a string.- For example, if your new prompt file is custom_prompt.py, then set in config.py:
ACTIVE_PROMPT = "custom_prompt"
- For example, if your new prompt file is custom_prompt.py, then set in config.py:
To add AlwaysReddy to your startup list so it starts automatically on your computer startup, follow these steps:
- run
venv\Scripts\activate
- Run
python setup.py
, follow the prompts, it will ask you if you want to add AlwaysReddy to the startup list, press Y the confrim
If you want to remove AlwaysReddy from the startup list you can follow the same steps again, only say no when asked if you want to add AlwaysReddy to the startup list and it will ask if you would like to remove it, press Y.
PLEASE NOTE: Custom actions is a very experimental feature that I am likely to chnage a lot, any actions you make will in all likelyhood need to be updated in some way as I update and change the actions system
The action system allows you to easily define new functionality and bind it to a hotkey event, it allows you to easily use the following functionalitys from the AlwayReddy code base:
- Record audio
- Transcribe audio
- Run and play TTS
- Generate responses from any of the supported LLM servers
- Read and save to the clipboard
This video shows the process of making an action from scratch: https://youtu.be/X0Bd20EDxfQ Example action: https://github.com/ILikeAI/alwaysreddy_add_to_md_note
The toggle_recording
method starts or stops audio recording. When called the first time, it starts recording. The next call stops recording and returns the audio file path.
By default, if the recording times out, it's stopped and deleted. However, you can provide a callback function that will be executed on timeout instead. In the code example, transcription_action
is passed as the callback. When the recording times out, transcription_action
is called, which calls toggle_recording
again, thereby stopping the recording and returning the audio file for transcription.
def transcription_action(self):
"""Handle the transcription process."""
recording_filename = self.AR.toggle_recording(self.transcription_action)
if recording_filename:
transcript = self.AR.transcription_manager.transcribe_audio(recording_filename)
to_clipboard(transcript)
print("Transcription copied to clipboard.")
The setup method of your action will run when AlwaysReddy starts, this is where you use the add_action_hotkey
method to bind your code to a hotkey press, below is an example of binding hotkeys to the transcription_action
method.
self.AR.add_action_hotkey("ctrl+alt+t",
pressed=self.transcription_action,
held_release=self.transcription_action)
Here we are binding the pressed
and held_release
hotkey events to our function.
Below are the arguments for add_action_hotkey
:
hotkey (str): The hotkey combination.
pressed (callable, optional): Callback for when the hotkey is pressed.
released (callable, optional): Callback for when the hotkey is released.
held (callable, optional): Callback for when the hotkey is held.
held_release (callable, optional): Callback for when the hotkey is released after being held.
double_tap (callable, optional): Callback for when the hotkey is double-tapped.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AlwaysReddy
Similar Open Source Tools
AlwaysReddy
AlwaysReddy is a simple LLM assistant with no UI that you interact with entirely using hotkeys. It can easily read from or write to your clipboard, and voice chat with you via TTS and STT. Here are some of the things you can use AlwaysReddy for: - Explain a new concept to AlwaysReddy and have it save the concept (in roughly your words) into a note. - Ask AlwaysReddy "What is X called?" when you know how to roughly describe something but can't remember what it is called. - Have AlwaysReddy proofread the text in your clipboard before you send it. - Ask AlwaysReddy "From the comments in my clipboard, what do the r/LocalLLaMA users think of X?" - Quickly list what you have done today and get AlwaysReddy to write a journal entry to your clipboard before you shutdown the computer for the day.
ai-toolkit
The AI Toolkit by Ostris is a collection of tools for machine learning, specifically designed for image generation, LoRA (latent representations of attributes) extraction and manipulation, and model training. It provides a user-friendly interface and extensive documentation to make it accessible to both developers and non-developers. The toolkit is actively under development, with new features and improvements being added regularly. Some of the key features of the AI Toolkit include: - Batch Image Generation: Allows users to generate a batch of images based on prompts or text files, using a configuration file to specify the desired settings. - LoRA (lierla), LoCON (LyCORIS) Extractor: Facilitates the extraction of LoRA and LoCON representations from pre-trained models, enabling users to modify and manipulate these representations for various purposes. - LoRA Rescale: Provides a tool to rescale LoRA weights, allowing users to adjust the influence of specific attributes in the generated images. - LoRA Slider Trainer: Enables the training of LoRA sliders, which can be used to control and adjust specific attributes in the generated images, offering a powerful tool for fine-tuning and customization. - Extensions: Supports the creation and sharing of custom extensions, allowing users to extend the functionality of the toolkit with their own tools and scripts. - VAE (Variational Auto Encoder) Trainer: Facilitates the training of VAEs for image generation, providing users with a tool to explore and improve the quality of generated images. The AI Toolkit is a valuable resource for anyone interested in exploring and utilizing machine learning for image generation and manipulation. Its user-friendly interface, extensive documentation, and active development make it an accessible and powerful tool for both beginners and experienced users.
call-gpt
Call GPT is a voice application that utilizes Deepgram for Speech to Text, elevenlabs for Text to Speech, and OpenAI for GPT prompt completion. It allows users to chat with ChatGPT on the phone, providing better transcription, understanding, and speaking capabilities than traditional IVR systems. The app returns responses with low latency, allows user interruptions, maintains chat history, and enables GPT to call external tools. It coordinates data flow between Deepgram, OpenAI, ElevenLabs, and Twilio Media Streams, enhancing voice interactions.
ai-voice-cloning
This repository provides a tool for AI voice cloning, allowing users to generate synthetic speech that closely resembles a target speaker's voice. The tool is designed to be user-friendly and accessible, with a graphical user interface that guides users through the process of training a voice model and generating synthetic speech. The tool also includes a variety of features that allow users to customize the generated speech, such as the pitch, volume, and speaking rate. Overall, this tool is a valuable resource for anyone interested in creating realistic and engaging synthetic speech.
concierge
Concierge is a versatile automation tool designed to streamline repetitive tasks and workflows. It provides a user-friendly interface for creating custom automation scripts without the need for extensive coding knowledge. With Concierge, users can automate various tasks across different platforms and applications, increasing efficiency and productivity. The tool offers a wide range of pre-built automation templates and allows users to customize and schedule their automation processes. Concierge is suitable for individuals and businesses looking to automate routine tasks and improve overall workflow efficiency.
whisper_dictation
Whisper Dictation is a fast, offline, privacy-focused tool for voice typing, AI voice chat, voice control, and translation. It allows hands-free operation, launching and controlling apps, and communicating with OpenAI ChatGPT or a local chat server. The tool also offers the option to speak answers out loud and draw pictures. It includes client and server versions, inspired by the Star Trek series, and is designed to keep data off the internet and confidential. The project is optimized for dictation and translation tasks, with voice control capabilities and AI image generation using stable-diffusion API.
smartcat
Smartcat is a CLI interface that brings language models into the Unix ecosystem, allowing power users to leverage the capabilities of LLMs in their daily workflows. It features a minimalist design, seamless integration with terminal and editor workflows, and customizable prompts for specific tasks. Smartcat currently supports OpenAI, Mistral AI, and Anthropic APIs, providing access to a range of language models. With its ability to manipulate file and text streams, integrate with editors, and offer configurable settings, Smartcat empowers users to automate tasks, enhance code quality, and explore creative possibilities.
openui
OpenUI is a tool designed to simplify the process of building UI components by allowing users to describe UI using their imagination and see it rendered live. It supports converting HTML to React, Svelte, Web Components, etc. The tool is open source and aims to make UI development fun, fast, and flexible. It integrates with various AI services like OpenAI, Groq, Gemini, Anthropic, Cohere, and Mistral, providing users with the flexibility to use different models. OpenUI also supports LiteLLM for connecting to various LLM services and allows users to create custom proxy configs. The tool can be run locally using Docker or Python, and it offers a development environment for quick setup and testing.
gpt-subtrans
GPT-Subtrans is an open-source subtitle translator that utilizes large language models (LLMs) as translation services. It supports translation between any language pairs that the language model supports. Note that GPT-Subtrans requires an active internet connection, as subtitles are sent to the provider's servers for translation, and their privacy policy applies.
feeds.fun
Feeds Fun is a self-hosted news reader tool that automatically assigns tags to news entries. Users can create rules to score news based on tags, filter and sort news as needed, and track read news. The tool offers multi/single-user support, feeds management, and various features for personalized news consumption. Users can access the tool's backend as the ffun package on PyPI and the frontend as the feeds-fun package on NPM. Feeds Fun requires setting up OpenAI or Gemini API keys for full tag generation capabilities. The tool uses tag processors to detect tags for news entries, with options for simple and complex processors. Feeds Fun primarily relies on LLM tag processors from OpenAI and Google for tag generation.
kobold_assistant
Kobold-Assistant is a fully offline voice assistant interface to KoboldAI's large language model API. It can work online with the KoboldAI horde and online speech-to-text and text-to-speech models. The assistant, called Jenny by default, uses the latest coqui 'jenny' text to speech model and openAI's whisper speech recognition. Users can customize the assistant name, speech-to-text model, text-to-speech model, and prompts through configuration. The tool requires system packages like GCC, portaudio development libraries, and ffmpeg, along with Python >=3.7, <3.11, and runs on Ubuntu/Debian systems. Users can interact with the assistant through commands like 'serve' and 'list-mics'.
WebCraftifyAI
WebCraftifyAI is a software aid that makes it easy to create and build web pages and content. It is designed to be user-friendly and accessible to people of all skill levels. With WebCraftifyAI, you can quickly and easily create professional-looking websites without having to learn complex coding or design skills.
Discord-AI-Selfbot
Discord-AI-Selfbot is a Python-based Discord selfbot that uses the `discord.py-self` library to automatically respond to messages mentioning its trigger word using Groq API's Llama-3 model. It functions as a normal Discord bot on a real Discord account, enabling interactions in DMs, servers, and group chats without needing to invite a bot. The selfbot comes with features like custom AI instructions, free LLM model usage, mention and reply recognition, message handling, channel-specific responses, and a psychoanalysis command to analyze user messages for insights on personality.
nobodywho
NobodyWho is a plugin for the Godot game engine that enables interaction with local LLMs for interactive storytelling. Users can install it from Godot editor or GitHub releases page, providing their own LLM in GGUF format. The plugin consists of `NobodyWhoModel` node for model file, `NobodyWhoChat` node for chat interaction, and `NobodyWhoEmbedding` node for generating embeddings. It offers a programming interface for sending text to LLM, receiving responses, and starting the LLM worker.
ai-rag-chat-evaluator
This repository contains scripts and tools for evaluating a chat app that uses the RAG architecture. It provides parameters to assess the quality and style of answers generated by the chat app, including system prompt, search parameters, and GPT model parameters. The tools facilitate running evaluations, with examples of evaluations on a sample chat app. The repo also offers guidance on cost estimation, setting up the project, deploying a GPT-4 model, generating ground truth data, running evaluations, and measuring the app's ability to say 'I don't know'. Users can customize evaluations, view results, and compare runs using provided tools.
cog-comfyui
Cog-comfyui allows users to run ComfyUI workflows on Replicate. ComfyUI is a visual programming tool for creating and sharing generative art workflows. With cog-comfyui, users can access a variety of pre-trained models and custom nodes to create their own unique artworks. The tool is easy to use and does not require any coding experience. Users simply need to upload their API JSON file and any necessary input files, and then click the "Run" button. Cog-comfyui will then generate the output image or video file.
For similar tasks
AlwaysReddy
AlwaysReddy is a simple LLM assistant with no UI that you interact with entirely using hotkeys. It can easily read from or write to your clipboard, and voice chat with you via TTS and STT. Here are some of the things you can use AlwaysReddy for: - Explain a new concept to AlwaysReddy and have it save the concept (in roughly your words) into a note. - Ask AlwaysReddy "What is X called?" when you know how to roughly describe something but can't remember what it is called. - Have AlwaysReddy proofread the text in your clipboard before you send it. - Ask AlwaysReddy "From the comments in my clipboard, what do the r/LocalLLaMA users think of X?" - Quickly list what you have done today and get AlwaysReddy to write a journal entry to your clipboard before you shutdown the computer for the day.
generative-ai-use-cases-jp
Generative AI (生成 AI) brings revolutionary potential to transform businesses. This repository demonstrates business use cases leveraging Generative AI.
WritingTools
Writing Tools is an Apple Intelligence-inspired application for Windows, Linux, and macOS that supercharges your writing with an AI LLM. It allows users to instantly proofread, optimize text, and summarize content from webpages, YouTube videos, documents, etc. The tool is privacy-focused, open-source, and supports multiple languages. It offers powerful features like grammar correction, content summarization, and LLM chat mode, making it a versatile writing assistant for various tasks.
For similar jobs
ChatFAQ
ChatFAQ is an open-source comprehensive platform for creating a wide variety of chatbots: generic ones, business-trained, or even capable of redirecting requests to human operators. It includes a specialized NLP/NLG engine based on a RAG architecture and customized chat widgets, ensuring a tailored experience for users and avoiding vendor lock-in.
anything-llm
AnythingLLM is a full-stack application that enables you to turn any document, resource, or piece of content into context that any LLM can use as references during chatting. This application allows you to pick and choose which LLM or Vector Database you want to use as well as supporting multi-user management and permissions.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
mikupad
mikupad is a lightweight and efficient language model front-end powered by ReactJS, all packed into a single HTML file. Inspired by the likes of NovelAI, it provides a simple yet powerful interface for generating text with the help of various backends.
glide
Glide is a cloud-native LLM gateway that provides a unified REST API for accessing various large language models (LLMs) from different providers. It handles LLMOps tasks such as model failover, caching, key management, and more, making it easy to integrate LLMs into applications. Glide supports popular LLM providers like OpenAI, Anthropic, Azure OpenAI, AWS Bedrock (Titan), Cohere, Google Gemini, OctoML, and Ollama. It offers high availability, performance, and observability, and provides SDKs for Python and NodeJS to simplify integration.
onnxruntime-genai
ONNX Runtime Generative AI is a library that provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. Users can call a high level `generate()` method, or run each iteration of the model in a loop. It supports greedy/beam search and TopP, TopK sampling to generate token sequences, has built in logits processing like repetition penalties, and allows for easy custom scoring.
firecrawl
Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown. It crawls all accessible subpages and provides clean markdown for each, without requiring a sitemap. The API is easy to use and can be self-hosted. It also integrates with Langchain and Llama Index. The Python SDK makes it easy to crawl and scrape websites in Python code.