
TinyLLM
Setup and run a local LLM and Chatbot using consumer grade hardware.
Stars: 224

TinyLLM is a project that helps build a small locally hosted language model with a web interface using consumer-grade hardware. It supports multiple language models, builds a local OpenAI API web service, and serves a Chatbot web interface with customizable prompts. The project requires specific hardware and software configurations for optimal performance. Users can run a local language model using inference servers like vLLM, llama-cpp-python, and Ollama. The Chatbot feature allows users to interact with the language model through a web-based interface, supporting features like summarizing websites, displaying news headlines, stock prices, weather conditions, and using vector databases for queries.
README:
TinyLLM? Yes, the name is a bit of a contradiction, but it means well. It's all about putting a large language model (LLM) on a tiny system that still delivers acceptable performance.
This project helps you build a small locally hosted LLM with a ChatGPT-like web interface using consumer grade hardware. To read more about my research with llama.cpp and LLMs, see research.md.
- Supports multiple LLMs (see list below)
- Builds a local OpenAI API web service via Ollama, llama.cpp or vLLM.
- Serves up a Chatbot web interface with customizable prompts, accessing external websites (URLs), vector databases and other sources (e.g. news, stocks, weather).
- CPU: Intel, AMD or Apple Silicon
- Memory: 8GB+ DDR4
- Disk: 128G+ SSD
- GPU: NVIDIA (e.g. GTX 1060 6GB, RTX 3090 24GB) or Apple M1/M2
- OS: Ubuntu Linux, MacOS
- Software: Python 3, CUDA Version: 12.2
TODO - Quick start setup script.
# Clone the project
git clone https://github.com/jasonacox/TinyLLM.git
cd TinyLLM
To run a local LLM, you will need an inference server for the model. This project recommends these options: vLLM, llama-cpp-python, and Ollama. All of these provide a built-in OpenAI API compatible web server that will make it easier for you to integrate with other tools.
The Ollama project has made it super easy to install and run LLMs on a variety of systems (MacOS, Linux, Windows) with limited hardware. It serves up an OpenAI compatible API as well. The underlying LLM engine is llama.cpp. Like llama.cpp, the downside with this server is that it can only handle one session/prompt at a time. To run the Ollama server container:
# Install and run Ollama server
docker run -d --gpus=all \
-v $PWD/ollama:/root/.ollama \
-p 11434:11434 \
-p 8000:11434 \
--restart unless-stopped \
--name ollama \
ollama/ollama
# Download and test run the llama3 model
docker exec -it ollama ollama run llama3
# Tell server to keep model loaded in GPU
curl http://localhost:11434/api/generate -d '{"model": "llama3", "keep_alive": -1}'
Ollama support several models (LLMs): https://ollama.com/library If you set up the docker container mentioned above, you can down and run them using:
# Download and run Phi-3 Mini, open model by Microsoft.
docker exec -it ollama ollama run phi3
# Download and run mistral 7B model, by Mistral AI
docker exec -it ollama ollama run mistral
If you use the TinyLLM Chatbot (see below) with Ollama, make sure you specify the model via: LLM_MODEL="llama3"
This will cause Ollama to download and run this model. It may take a while to start on first run unless you run one of the ollama run
or curl
commands above.
vLLM offers a robust OpenAI API compatible web server that supports multiple simultaneous inference threads (sessions). It automatically downloads the models you specifdy from HuggingFace and runs extremely well in containers. vLLM requires GPUs with more VRAM since it uses non-quantized models. AWQ models are also available and more optimizations are underway in the project to reduce the memory footprint. Note, for GPUs with a compute capability of 6 or less, Pascal architecture (see GPU table), follow details here instead.
# Build Container
cd vllm
./build.sh
# Make a Directory to store Models
mkdir models
# Edit run.sh or run-awq.sh to pull the model you want to use. Mistral is set by default.
# Run the Container - This will download the model on the first run
./run.sh
# The trailing logs will be displayed so you can see the progress. Use ^C to exit without
# stopping the container.
The llama-cpp-python's OpenAI API compatible web server is easy to set up and use. It runs optimized GGUF models that work well on many consumer grade GPUs with small amounts of VRAM. As with Ollama, a downside with this server is that it can only handle one session/prompt at a time. The steps below outline how to setup and run the server via command line. Read the details in llmserver to see how to set it up as a persistent service or docker container on your Linux host.
# Uninstall any old version of llama-cpp-python
pip3 uninstall llama-cpp-python -y
# Linux Target with Nvidia CUDA support
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip3 install llama-cpp-python==0.2.27 --no-cache-dir
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip3 install llama-cpp-python[server]==0.2.27 --no-cache-dir
# MacOS Target with Apple Silicon M1/M2
CMAKE_ARGS="-DLLAMA_METAL=on" pip3 install -U llama-cpp-python --no-cache-dir
pip3 install 'llama-cpp-python[server]'
# Download Models from HuggingFace
cd llmserver/models
# Get the Mistral 7B GGUF Q-5bit model Q5_K_M and Meta LLaMA-2 7B GGUF Q-5bit model Q5_K_M
wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q5_K_M.gguf
wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf
# Run Test - API Server
python3 -m llama_cpp.server \
--model ./models/mistral-7b-instruct-v0.1.Q5_K_M.gguf \
--host localhost \
--n_gpu_layers 99 \
--n_ctx 2048 \
--chat_format llama-2
The TinyLLM Chatbot is a simple web based python FastAPI app that allows you to chat with an LLM using the OpenAI API. It supports multiple sessions and remembers your conversational history. Some RAG (Retrieval Augmented Generation) features including:
- Summarizing external websites and PDFs (paste a URL in chat window)
- List top 10 headlines from current news (use
/news
) - Display company stock symbol and current stock price (use
/stock <company>
) - Provide current weather conditions (use
/weather <location>
) - Use a vector databases for RAG queries - see RAG page for details
# Move to chatbot folder
cd ../chatbot
touch prompts.json
# Pull and run latest container - see run.sh
docker run \
-d \
-p 5000:5000 \
-e PORT=5000 \
-e OPENAI_API_BASE="http://localhost:8000/v1" \
-e LLM_MODEL="tinyllm" \
-e USE_SYSTEM="false" \
-e SENTENCE_TRANSFORMERS_HOME=/app/.tinyllm \
-v $PWD/.tinyllm:/app/.tinyllm \
--name chatbot \
--restart unless-stopped \
jasonacox/chatbot
Open http://localhost:5000 - Example session:
If a URL is pasted in the text box, the chatbot will read and summarize it.
The /news
command will fetch the latest news and have the LLM summarize the top ten headlines. It will store the raw feed in the context prompt to allow follow-up questions.
You can also test the chatbot server without docker using the following.
# Install required packages
pip3 install fastapi uvicorn python-socketio jinja2 openai bs4 pypdf requests lxml aiohttp
# Run the chatbot web server
python3 server.py
Here are some suggested models that work well with llmserver (llama-cpp-python). You can test other models and different quantization, but in my experiments, the Q5_K_M models performed the best. Below are the download links from HuggingFace as well as the model card's suggested context length size and chat prompt mode.
LLM | Quantized | Link to Download | Context Length | Chat Prompt Mode |
---|---|---|---|---|
7B Models | ||||
Mistral v0.1 7B | 5-bit | mistral-7b-instruct-v0.1.Q5_K_M.gguf | 4096 | llama-2 |
Llama-2 7B | 5-bit | llama-2-7b-chat.Q5_K_M.gguf | 2048 | llama-2 |
Mistrallite 32K 7B | 5-bit | mistrallite.Q5_K_M.gguf | 16384 | mistrallite (can be glitchy) |
10B Models | ||||
Nous-Hermes-2-SOLAR 10.7B | 5-bit | nous-hermes-2-solar-10.7b.Q5_K_M.gguf | 4096 | chatml |
13B Models | ||||
Claude2 trained Alpaca 13B | 5-bit | claude2-alpaca-13b.Q5_K_M.gguf | 2048 | chatml |
Llama-2 13B | 5-bit | llama-2-13b-chat.Q5_K_M.gguf | 2048 | llama-2 |
Vicuna 13B v1.5 | 5-bit | vicuna-13b-v1.5.Q5_K_M.gguf | 2048 | vicuna |
Mixture-of-Experts (MoE) Models | ||||
Hai's Mixtral 11Bx2 MoE 19B | 5-bit | mixtral_11bx2_moe_19b.Q5_K_M.gguf | 4096 | chatml |
Mixtral-8x7B v0.1 | 3-bit | Mixtral-8x7B-Instruct-v0.1-GGUF | 4096 | llama-2 |
Mixtral-8x7B v0.1 | 4-bit | Mixtral-8x7B-Instruct-v0.1-GGUF | 4096 | llama-2 |
Here are some suggested models that work well with vLLM.
LLM | Quantized | Link to Download | Context Length | License |
---|---|---|---|---|
Mistral v0.1 7B | None | mistralai/Mistral-7B-Instruct-v0.1 | 32k | Apache 2 |
Mistral v0.2 7B | None | mistralai/Mistral-7B-Instruct-v0.2 | 32k | Apache 2 |
Mistral v0.1 7B AWQ | AWQ | TheBloke/Mistral-7B-Instruct-v0.1-AWQ | 32k | Apache 2 |
Mixtral-8x7B | None | mistralai/Mixtral-8x7B-Instruct-v0.1 | 32k | Apache 2 |
Pixtral-12B-2409 12B Vision | None | mistralai/Pixtral-12B-2409 | 128k | Apache 2 |
Meta Llama-3 8B | None | meta-llama/Meta-Llama-3-8B-Instruct | 8k | Meta |
Meta Llama-3.2 11B Vision | FP8 | neuralmagic/Llama-3.2-11B-Vision-Instruct-FP8-dynamic | 128k | Meta |
Qwen-2.5 7B | None | Qwen/Qwen2.5-7B-Instruct | 128k | Apache 2 |
Yi-1.5 9B | None | 01-ai/Yi-1.5-9B-Chat-16K | 16k | Apache 2 |
Phi-3 Small 7B | None | microsoft/Phi-3-small-8k-instruct | 16k | MIT |
Phi-3 Medium 14B | None | microsoft/Phi-3-medium-4k-instruct | 4k | MIT |
Phi-3.5 Vision 4B | None | microsoft/Phi-3.5-vision-instruct | 128k | MIT |
Phi-4 14B | None | microsoft/phi-4 | 16k | MIT |
A CLI utility (llm
) and Python library for interacting with Large Language Models. To configure this tool to use your local LLM's OpenAI API:
# Install llm command line tool
pipx install llm
# Location to store configuration files:
dirname "$(llm logs path)"
You define the model in the extra-openai-models.yaml
file. Create this file in the directory discovered above. Edit the model_name and api_base to match your LLM OpenAI API setup:
- model_id: tinyllm
model_name: meta-llama/Meta-Llama-3.1-8B-Instruct
api_base: "http://localhost:8000/v1"
# Configure llm to use your local model
llm models default tinyllm
# Test
llm "What is love?"
- LLaMa.cpp - https://github.com/ggerganov/llama.cpp
- LLaMa-cpp-python - https://github.com/abetlen/llama-cpp-python
- vLLM - https://github.com/vllm-project/vllm
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for TinyLLM
Similar Open Source Tools

TinyLLM
TinyLLM is a project that helps build a small locally hosted language model with a web interface using consumer-grade hardware. It supports multiple language models, builds a local OpenAI API web service, and serves a Chatbot web interface with customizable prompts. The project requires specific hardware and software configurations for optimal performance. Users can run a local language model using inference servers like vLLM, llama-cpp-python, and Ollama. The Chatbot feature allows users to interact with the language model through a web-based interface, supporting features like summarizing websites, displaying news headlines, stock prices, weather conditions, and using vector databases for queries.

ramalama
The Ramalama project simplifies working with AI by utilizing OCI containers. It automatically detects GPU support, pulls necessary software in a container, and runs AI models. Users can list, pull, run, and serve models easily. The tool aims to support various GPUs and platforms in the future, making AI setup hassle-free.

llava-docker
This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

RobustVLM
This repository contains code for the paper 'Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models'. It focuses on fine-tuning CLIP in an unsupervised manner to enhance its robustness against visual adversarial attacks. By replacing the vision encoder of large vision-language models with the fine-tuned CLIP models, it achieves state-of-the-art adversarial robustness on various vision-language tasks. The repository provides adversarially fine-tuned ViT-L/14 CLIP models and offers insights into zero-shot classification settings and clean accuracy improvements.

GenAIComps
GenAIComps is an initiative aimed at building enterprise-grade Generative AI applications using a microservice architecture. It simplifies the scaling and deployment process for production, abstracting away infrastructure complexities. GenAIComps provides a suite of containerized microservices that can be assembled into a mega-service tailored for real-world Enterprise AI applications. The modular approach of microservices allows for independent development, deployment, and scaling of individual components, promoting modularity, flexibility, and scalability. The mega-service orchestrates multiple microservices to deliver comprehensive solutions, encapsulating complex business logic and workflow orchestration. The gateway serves as the interface for users to access the mega-service, providing customized access based on user requirements.

star-vector
StarVector is a multimodal vision-language model for Scalable Vector Graphics (SVG) generation. It can be used to perform image2SVG and text2SVG generation. StarVector works directly in the SVG code space, leveraging visual understanding to apply accurate SVG primitives. It achieves state-of-the-art performance in producing compact and semantically rich SVGs. The tool provides Hugging Face model checkpoints for image2SVG vectorization, with models like StarVector-8B and StarVector-1B. It also offers datasets like SVG-Stack, SVG-Fonts, SVG-Icons, SVG-Emoji, and SVG-Diagrams for evaluation. StarVector can be trained using Deepspeed or FSDP for tasks like Image2SVG and Text2SVG generation. The tool provides a demo with options for HuggingFace generation or VLLM backend for faster generation speed.

AQLM
AQLM is the official PyTorch implementation for Extreme Compression of Large Language Models via Additive Quantization. It includes prequantized AQLM models without PV-Tuning and PV-Tuned models for LLaMA, Mistral, and Mixtral families. The repository provides inference examples, model details, and quantization setups. Users can run prequantized models using Google Colab examples, work with different model families, and install the necessary inference library. The repository also offers detailed instructions for quantization, fine-tuning, and model evaluation. AQLM quantization involves calibrating models for compression, and users can improve model accuracy through finetuning. Additionally, the repository includes information on preparing models for inference and contributing guidelines.

floneum
Floneum is a graph editor that makes it easy to develop your own AI workflows. It uses large language models (LLMs) to run AI models locally, without any external dependencies or even a GPU. This makes it easy to use LLMs with your own data, without worrying about privacy. Floneum also has a plugin system that allows you to improve the performance of LLMs and make them work better for your specific use case. Plugins can be used in any language that supports web assembly, and they can control the output of LLMs with a process similar to JSONformer or guidance.

AiOS
AiOS is a tool for human pose and shape estimation, performing human localization and SMPL-X estimation in a progressive manner. It consists of body localization, body refinement, and whole-body refinement stages. Users can download datasets for evaluation, SMPL-X body models, and AiOS checkpoint. Installation involves creating a conda virtual environment, installing PyTorch, torchvision, Pytorch3D, MMCV, and other dependencies. Inference requires placing the video for inference and pretrained models in specific directories. Test results are provided for NMVE, NMJE, MVE, and MPJPE on datasets like BEDLAM and AGORA. Users can run scripts for AGORA validation, AGORA test leaderboard, and BEDLAM leaderboard. The tool acknowledges codes from MMHuman3D, ED-Pose, and SMPLer-X.

clearml-fractional-gpu
ClearML Fractional GPU is a tool designed to optimize GPU resource utilization by allowing multiple containers to run on the same GPU with driver-level memory limitation and compute time-slicing. It supports CUDA 11.x & CUDA 12.x, preventing greedy processes from grabbing the entire GPU memory. The tool offers options like Dynamic GPU Slicing, Container-based Memory Limits, and Kubernetes-based Static MIG Slicing to enhance hardware utilization and workload performance for AI development.

llama-gpt
LlamaGPT is a self-hosted, offline, ChatGPT-like chatbot, powered by Llama 2. It is 100% private, with no data leaving your device. It supports various models, including Nous Hermes Llama 2 7B/13B/70B Chat and Code Llama 7B/13B/34B Chat. You can install LlamaGPT on your umbrelOS home server, M1/M2 Mac, or anywhere else with Docker. It also provides an OpenAI-compatible API for easy integration. LlamaGPT is still under development, with plans to add more features such as custom model loading and model switching.

optillm
optillm is an OpenAI API compatible optimizing inference proxy implementing state-of-the-art techniques to enhance accuracy and performance of LLMs, focusing on reasoning over coding, logical, and mathematical queries. By leveraging additional compute at inference time, it surpasses frontier models across diverse tasks.

dl_model_infer
This project is a c++ version of the AI reasoning library that supports the reasoning of tensorrt models. It provides accelerated deployment cases of deep learning CV popular models and supports dynamic-batch image processing, inference, decode, and NMS. The project has been updated with various models and provides tutorials for model exports. It also includes a producer-consumer inference model for specific tasks. The project directory includes implementations for model inference applications, backend reasoning classes, post-processing, pre-processing, and target detection and tracking. Speed tests have been conducted on various models, and onnx downloads are available for different models.

chatgpt-lite
ChatGPT Lite is a lightweight web interface developed using Next.js and the OpenAI Chat API. It allows users to deploy a custom ChatGPT interface supporting markdown, prompt storage, and multi-person chats. Users can create private web-based ChatGPT instances for friends without sharing API keys. The codebase is clear and expandable, making it an ideal starting point for AI projects.

SemanticFinder
SemanticFinder is a frontend-only live semantic search tool that calculates embeddings and cosine similarity client-side using transformers.js and SOTA embedding models from Huggingface. It allows users to search through large texts like books with pre-indexed examples, customize search parameters, and offers data privacy by keeping input text in the browser. The tool can be used for basic search tasks, analyzing texts for recurring themes, and has potential integrations with various applications like wikis, chat apps, and personal history search. It also provides options for building browser extensions and future ideas for further enhancements and integrations.

MooER
MooER (摩耳) is an LLM-based speech recognition and translation model developed by Moore Threads. It allows users to transcribe speech into text (ASR) and translate speech into other languages (AST) in an end-to-end manner. The model was trained using 5K hours of data and is now also available with an 80K hours version. MooER is the first LLM-based speech model trained and inferred using domestic GPUs. The repository includes pretrained models, inference code, and a Gradio demo for a better user experience.
For similar tasks

TinyLLM
TinyLLM is a project that helps build a small locally hosted language model with a web interface using consumer-grade hardware. It supports multiple language models, builds a local OpenAI API web service, and serves a Chatbot web interface with customizable prompts. The project requires specific hardware and software configurations for optimal performance. Users can run a local language model using inference servers like vLLM, llama-cpp-python, and Ollama. The Chatbot feature allows users to interact with the language model through a web-based interface, supporting features like summarizing websites, displaying news headlines, stock prices, weather conditions, and using vector databases for queries.

LLMinator
LLMinator is a Gradio-based tool with an integrated chatbot designed to locally run and test Language Model Models (LLMs) directly from HuggingFace. It provides an easy-to-use interface made with Gradio, LangChain, and Torch, offering features such as context-aware streaming chatbot, inbuilt code syntax highlighting, loading any LLM repo from HuggingFace, support for both CPU and CUDA modes, enabling LLM inference with llama.cpp, and model conversion capabilities.

ReEdgeGPT
ReEdgeGPT is a tool designed for reverse engineering the chat feature of the new version of Bing. It provides documentation and guidance on how to collect and use cookies to access the chat feature. The tool allows users to create a chatbot using the collected cookies and interact with the Bing GPT chatbot. It also offers support for different modes like Copilot and Bing, along with plugins for various tasks. The tool covers historical information about Rome, the Lazio region, and provides troubleshooting tips for common issues encountered while using the tool.

gemini_multipdf_chat
Gemini PDF Chatbot is a Streamlit-based application that allows users to chat with a conversational AI model trained on PDF documents. The chatbot extracts information from uploaded PDF files and answers user questions based on the provided context. It features PDF upload, text extraction, conversational AI using the Gemini model, and a chat interface. Users can deploy the application locally or to the cloud, and the project structure includes main application script, environment variable file, requirements, and documentation. Dependencies include PyPDF2, langchain, Streamlit, google.generativeai, and dotenv.

LLM_AppDev-HandsOn
This repository showcases how to build a simple LLM-based chatbot for answering questions based on documents using retrieval augmented generation (RAG) technique. It also provides guidance on deploying the chatbot using Podman or on the OpenShift Container Platform. The workshop associated with this repository introduces participants to LLMs & RAG concepts and demonstrates how to customize the chatbot for specific purposes. The software stack relies on open-source tools like streamlit, LlamaIndex, and local open LLMs via Ollama, making it accessible for GPU-constrained environments.

AI-Playground
AI Playground is an open-source project and AI PC starter app designed for AI image creation, image stylizing, and chatbot functionalities on a PC powered by an Intel Arc GPU. It leverages libraries from GitHub and Huggingface, providing users with the ability to create AI-generated content and interact with chatbots. The tool requires specific hardware specifications and offers packaged installers for ease of setup. Users can also develop the project environment, link it to the development environment, and utilize alternative models for different AI tasks.

Virtual_Avatar_ChatBot
Virtual_Avatar_ChatBot is a free AI Chatbot with visual movement that runs on your local computer with minimal GPU requirement. It supports various features like Oogbabooga, betacharacter.ai, and Locall LLM. The tool requires Windows 7 or above, Python, C++ Compiler, Git, and other dependencies. Users can contribute to the open-source project by reporting bugs, creating pull requests, or suggesting new features. The goal is to enhance Voicevox functionality, support local LLM inference, and give the waifu access to the internet. The project references various tools like desktop-waifu, CharacterAI, Whisper, PYVTS, COQUI-AI, VOICEVOX, and VOICEVOX API.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.