onnxruntime-server
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
Stars: 134
ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference. It aims to offer simple, high-performance ML inference and a good developer experience. Users can provide inference APIs for ONNX models without writing additional code by placing the models in the directory structure. Each session can choose between CPU or CUDA, analyze input/output, and provide Swagger API documentation for easy testing. Ready-to-run Docker images are available, making it convenient to deploy the server.
README:
- ONNX: Open Neural Network Exchange
- The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
- ONNX Runtime Server aims to provide simple, high-performance ML inference and a good developer experience.
- If you have exported ML models trained in various environments as ONNX files, you can provide inference APIs without writing additional code or metadata. Just place the ONNX files into the directory structure.
- Each ONNX session, you can choose to use CPU or CUDA.
- Analyze the input/output of ONNX models to provide type/shape information for your collaborators.
- Built-in Swagger API documentation makes it easy for collaborators to test ML models through the API. (API example)
- Ready-to-run Docker images. No build required.
- ONNX Runtime
- Boost
- CMake, pkg-config
- CUDA(optional, for Nvidia GPU support)
- OpenSSL(optional, for HTTPS)
- Use
download-onnxruntime-linux.sh
script- This script downloads the latest version of the binary and install to
/usr/local/onnxruntime
. - Also, add
/usr/local/onnxruntime/lib
to/etc/ld.so.conf.d/onnxruntime.conf
and runldconfig
.
- This script downloads the latest version of the binary and install to
- Or manually download binary from ONNX Runtime Releases.
brew install onnxruntime
sudo apt install cmake pkg-config libboost-all-dev libssl-dev
- Follow the instructions below to install the CUDA Toolkit and cuDNN.
sudo apt install cuda-toolkit-12 libcudnn9-dev-cuda-12
# optional, for Nvidia GPU support with Docker
sudo apt install nvidia-container-toolkit
brew install cmake boost openssl
cmake -B build -S . -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel
sudo cmake --install build --prefix /usr/local/onnxruntime-server
OS | Method | Command |
---|---|---|
Arch Linux | AUR | yay -S onnxruntime-server |
-
You must enter the path option(
--model-dir
) where the models are located.- The onnx model files must be located in the following path:
${model_dir}/${model_name}/${model_version}/model.onnx
or${model_dir}/${model_name}/${model_version}.onnx
- The onnx model files must be located in the following path:
Files in --model-dir
|
Create session request body | Get/Execute session API URL path (after created) |
---|---|---|
model_name/model_version/model.onnx or model_name/model_version.onnx
|
{"model":"model_name", "version":"model_version"} |
/api/sessions/model_name/model_version |
sample/v1/model.onnx or sample/v1.onnx
|
{"model":"sample", "version":"v1"} |
/api/sessions/sample/v1 |
sample/v2/model.onnx or sample/v2.onnx
|
{"model":"sample", "version":"v2"} |
/api/sessions/sample/v2 |
other/20200101/model.onnx or other/20200101.onnx
|
{"model":"other", "version":"20200101"} |
/api/sessions/other/20200101 |
-
You need to enable one of the following backends: TCP, HTTP, or HTTPS.
- If you want to use TCP, you must specify the
--tcp-port
option. - If you want to use HTTP, you must specify the
--http-port
option. - If you want to use HTTPS, you must specify the
--https-port
,--https-cert
and--https-key
options. - If you want to use Swagger, you must specify the
--swagger-url-path
option.
- If you want to use TCP, you must specify the
- Use the
-h
,--help
option to see a full list of options. -
All options can be set as environment variables. This can be useful when operating in a container like Docker.
- Normally, command-line options are prioritized over environment variables, but if
the
ONNX_SERVER_CONFIG_PRIORITY=env
environment variable exists, environment variables have higher priority. Within a Docker image, environment variables have higher priority.
- Normally, command-line options are prioritized over environment variables, but if
the
Option | Environment | Description |
---|---|---|
--workers |
ONNX_SERVER_WORKERS |
Worker thread pool size. Default: 4
|
--request-payload-limit |
ONNX_SERVER_REQUEST_PAYLOAD_LIMIT |
HTTP/HTTPS request payload size limit. Default: 1024 * 1024 * 10(10MB)` |
--model-dir |
ONNX_SERVER_MODEL_DIR |
Model directory path The onnx model files must be located in the following path: ${model_dir}/${model_name}/${model_version}/model.onnx or${model_dir}/${model_name}/${model_version}.onnx Default: models
|
--prepare-model |
ONNX_SERVER_PREPARE_MODEL |
Pre-create some model sessions at server startup. Format as a space-separated list of model_name:model_version or model_name:model_version(session_options, ...) .Available session_options are - cuda=device_id [ or true or false] eg) model1:v1 model2:v9 model1:v1(cuda=true) model2:v9(cuda=1)
|
Option | Environment | Description |
---|---|---|
--tcp-port |
ONNX_SERVER_TCP_PORT |
Enable TCP backend and which port number to use. |
--http-port |
ONNX_SERVER_HTTP_PORT |
Enable HTTP backend and which port number to use. |
--https-port |
ONNX_SERVER_HTTPS_PORT |
Enable HTTPS backend and which port number to use. |
--https-cert |
ONNX_SERVER_HTTPS_CERT |
SSL Certification file path for HTTPS |
--https-key |
ONNX_SERVER_HTTPS_KEY |
SSL Private key file path for HTTPS |
--swagger-url-path |
ONNX_SERVER_SWAGGER_URL_PATH |
Enable Swagger API document for HTTP/HTTPS backend. This value cannot start with "/api/" and "/health" If not specified, swagger document not provided. eg) /swagger or /api-docs |
Option | Environment | Description |
---|---|---|
--log-level |
ONNX_SERVER_LOG_LEVEL |
Log level(debug, info, warn, error, fatal) |
--log-file |
ONNX_SERVER_LOG_FILE |
Log file path. If not specified, logs will be printed to stdout. |
--access-log-file |
ONNX_SERVER_ACCESS_LOG_FILE |
Access log file path. If not specified, logs will be printed to stdout. |
- Docker hub: kibaes/onnxruntime-server
-
1.20.1-linux-cuda12
amd64(CUDA 12.x, cuDNN 9.x) -
1.20.1-linux-cpu
amd64, arm64
-
DOCKER_IMAGE=kibae/onnxruntime-server:1.20.1-linux-cuda12 # or kibae/onnxruntime-server:1.20.1-linux-cpu
docker pull ${DOCKER_IMAGE}
# simple http backend
docker run --name onnxruntime_server_container -d --rm --gpus all \
-p 80:80 \
-v "/your_model_dir:/app/models" \
-v "/your_log_dir:/app/logs" \
-e "ONNX_SERVER_SWAGGER_URL_PATH=/api-docs" \
${DOCKER_IMAGE}
- More information on using Docker images can be found here.
- docker-compose.yml example is available in the repository.
-
HTTP/HTTPS REST API
- API documentation (Swagger) is built in. If you want the server to serve swagger, add
the
--swagger-url-path=/swagger/
option at launch. This must be used with the--http-port
or--https-port
option../onnxruntime_server --model-dir=YOUR_MODEL_DIR --http-port=8080 --swagger-url-path=/api-docs/
- After running the server as above, you will be able to access the Swagger UI available
at
http://localhost:8080/api-docs/
.
- After running the server as above, you will be able to access the Swagger UI available
at
-
Swagger Sample
- API documentation (Swagger) is built in. If you want the server to serve swagger, add
the
- TCP API
- A few things have been left out to help you get a rough idea of the usage flow.
%%{init: {
'sequence': {'noteAlign': 'left', 'mirrorActors': true}
}}%%
sequenceDiagram
actor A as Administrator
box rgb(0, 0, 0, 0.1) "ONNX Runtime Server"
participant SD as Disk
participant SP as Process
end
actor C as Client
Note right of A: You have 3 models to serve.
A ->> SD: copy model files to disk.<br />"/var/models/model_A/v1/model.onnx"<br />"/var/models/model_A/v2/model.onnx"<br />"/var/models/model_B/20201101/model.onnx"
A ->> SP: Start server with --prepare-model option
activate SP
Note right of A: onnxruntime_server<br />--http-port=8080<br />--model-path=/var/models<br />--prepare-model="model_A:v1(cuda=0) model_A:v2(cuda=0)"
SP -->> SD: Load model
Note over SD, SP: Load model from<br />"/var/models/model_A/v1/model.onnx"
SD -->> SP: Model binary
activate SP
SP -->> SP: Create<br />onnxruntime<br />session
deactivate SP
deactivate SP
rect rgb(100, 100, 100, 0.3)
Note over SD, C: Execute Session
C ->> SP: Execute session request
activate SP
Note over SP, C: POST /api/sessions/model_A/v1<br />{<br />"x": [[1], [2], [3]],<br />"y": [[2], [3], [4]],<br />"z": [[3], [4], [5]]<br />}
activate SP
SP -->> SP: Execute<br />onnxruntime<br />session
deactivate SP
SP ->> C: Execute session response
deactivate SP
Note over SP, C: {<br />"output": [<br />[0.6492120623588562],<br />[0.7610487341880798],<br />[0.8728854656219482]<br />]<br />}
end
%%{init: {
'sequence': {'noteAlign': 'left', 'mirrorActors': true}
}}%%
sequenceDiagram
actor A as Administrator
box rgb(0, 0, 0, 0.1) "ONNX Runtime Server"
participant SD as Disk
participant SP as Process
end
actor C as Client
Note right of A: You have 3 models to serve.
A ->> SD: copy model files to disk.<br />"/var/models/model_A/v1/model.onnx"<br />"/var/models/model_A/v2/model.onnx"<br />"/var/models/model_B/20201101/model.onnx"
A ->> SP: Start server
Note right of A: onnxruntime_server<br />--http-port=8080<br />--model-path=/var/models
rect rgb(100, 100, 100, 0.3)
Note over SD, C: Create Session
C ->> SP: Create session request
activate SP
Note over SP, C: POST /api/sessions<br />{"model": "model_A", "version": "v1"}
SP -->> SD: Load model
Note over SD, SP: Load model from<br />"/var/models/model_A/v1/model.onnx"
SD -->> SP: Model binary
activate SP
SP -->> SP: Create<br />onnxruntime<br />session
deactivate SP
SP ->> C: Create session response
deactivate SP
Note over SP, C: {<br />"model": "model_A",<br />"version": "v1",<br />"created_at": 1694228106,<br />"execution_count": 0,<br />"last_executed_at": 0,<br />"inputs": {<br />"x": "float32[-1,1]",<br />"y": "float32[-1,1]",<br />"z": "float32[-1,1]"<br />},<br />"outputs": {<br />"output": "float32[-1,1]"<br />}<br />}
Note right of C: 👌 You can know the type and shape<br />of the input and output.
end
rect rgb(100, 100, 100, 0.3)
Note over SD, C: Execute Session
C ->> SP: Execute session request
activate SP
Note over SP, C: POST /api/sessions/model_A/v1<br />{<br />"x": [[1], [2], [3]],<br />"y": [[2], [3], [4]],<br />"z": [[3], [4], [5]]<br />}
activate SP
SP -->> SP: Execute<br />onnxruntime<br />session
deactivate SP
SP ->> C: Execute session response
deactivate SP
Note over SP, C: {<br />"output": [<br />[0.6492120623588562],<br />[0.7610487341880798],<br />[0.8728854656219482]<br />]<br />}
end
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for onnxruntime-server
Similar Open Source Tools
onnxruntime-server
ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference. It aims to offer simple, high-performance ML inference and a good developer experience. Users can provide inference APIs for ONNX models without writing additional code by placing the models in the directory structure. Each session can choose between CPU or CUDA, analyze input/output, and provide Swagger API documentation for easy testing. Ready-to-run Docker images are available, making it convenient to deploy the server.
gpt-home
GPT Home is a project that allows users to build their own home assistant using Raspberry Pi and OpenAI API. It serves as a guide for setting up a smart home assistant similar to Google Nest Hub or Amazon Alexa. The project integrates various components like OpenAI, Spotify, Philips Hue, and OpenWeatherMap to provide a personalized home assistant experience. Users can follow the detailed instructions provided to build their own version of the home assistant on Raspberry Pi, with optional components for customization. The project also includes system configurations, dependencies installation, and setup scripts for easy deployment. Overall, GPT Home offers a DIY solution for creating a smart home assistant using Raspberry Pi and OpenAI technology.
evalscope
Eval-Scope is a framework designed to support the evaluation of large language models (LLMs) by providing pre-configured benchmark datasets, common evaluation metrics, model integration, automatic evaluation for objective questions, complex task evaluation using expert models, reports generation, visualization tools, and model inference performance evaluation. It is lightweight, easy to customize, supports new dataset integration, model hosting on ModelScope, deployment of locally hosted models, and rich evaluation metrics. Eval-Scope also supports various evaluation modes like single mode, pairwise-baseline mode, and pairwise (all) mode, making it suitable for assessing and improving LLMs.
vnc-lm
vnc-lm is a Discord bot designed for messaging with language models. Users can configure model parameters, branch conversations, and edit prompts to enhance responses. The bot supports various providers like OpenAI, Huggingface, and Cloudflare Workers AI. It integrates with ollama and LiteLLM, allowing users to access a wide range of language model APIs through a single interface. Users can manage models, switch between models, split long messages, and create conversation branches. LiteLLM integration enables support for OpenAI-compatible APIs and local LLM services. The bot requires Docker for installation and can be configured through environment variables. Troubleshooting tips are provided for common issues like context window problems, Discord API errors, and LiteLLM issues.
Noi
Noi is an AI-enhanced customizable browser designed to streamline digital experiences. It includes curated AI websites, allows adding any URL, offers prompts management, Noi Ask for batch messaging, various themes, Noi Cache Mode for quick link access, cookie data isolation, and more. Users can explore, extend, and empower their browsing experience with Noi.
readme-ai
README-AI is a developer tool that auto-generates README.md files using a combination of data extraction and generative AI. It streamlines documentation creation and maintenance, enhancing developer productivity. This project aims to enable all skill levels, across all domains, to better understand, use, and contribute to open-source software. It offers flexible README generation, supports multiple large language models (LLMs), provides customizable output options, works with various programming languages and project types, and includes an offline mode for generating boilerplate README files without external API calls.
evalplus
EvalPlus is a rigorous evaluation framework for LLM4Code, providing HumanEval+ and MBPP+ tests to evaluate large language models on code generation tasks. It offers precise evaluation and ranking, coding rigorousness analysis, and pre-generated code samples. Users can use EvalPlus to generate code solutions, post-process code, and evaluate code quality. The tool includes tools for code generation and test input generation using various backends.
obsei
Obsei is an open-source, low-code, AI powered automation tool that consists of an Observer to collect unstructured data from various sources, an Analyzer to analyze the collected data with various AI tasks, and an Informer to send analyzed data to various destinations. The tool is suitable for scheduled jobs or serverless applications as all Observers can store their state in databases. Obsei is still in alpha stage, so caution is advised when using it in production. The tool can be used for social listening, alerting/notification, automatic customer issue creation, extraction of deeper insights from feedbacks, market research, dataset creation for various AI tasks, and more based on creativity.
wzry_ai
This is an open-source project for playing the game King of Glory with an artificial intelligence model. The first phase of the project has been completed, and future upgrades will be built upon this foundation. The second phase of the project has started, and progress is expected to proceed according to plan. For any questions, feel free to join the QQ exchange group: 687853827. The project aims to learn artificial intelligence and strictly prohibits cheating. Detailed installation instructions are available in the doc/README.md file. Environment installation video: (bilibili) Welcome to follow, like, tip, comment, and provide your suggestions.
dvc
DVC, or Data Version Control, is a command-line tool and VS Code extension that helps you develop reproducible machine learning projects. With DVC, you can version your data and models, iterate fast with lightweight pipelines, track experiments in your local Git repo, compare any data, code, parameters, model, or performance plots, and share experiments and automatically reproduce anyone's experiment.
TempCompass
TempCompass is a benchmark designed to evaluate the temporal perception ability of Video LLMs. It encompasses a diverse set of temporal aspects and task formats to comprehensively assess the capability of Video LLMs in understanding videos. The benchmark includes conflicting videos to prevent models from relying on single-frame bias and language priors. Users can clone the repository, install required packages, prepare data, run inference using examples like Video-LLaVA and Gemini, and evaluate the performance of their models across different tasks such as Multi-Choice QA, Yes/No QA, Caption Matching, and Caption Generation.
gollama
Gollama is a delightful tool that brings Ollama, your offline conversational AI companion, directly into your terminal. It provides a fun and interactive way to generate responses from various models without needing internet connectivity. Whether you're brainstorming ideas, exploring creative writing, or just looking for inspiration, Gollama is here to assist you. The tool offers an interactive interface, customizable prompts, multiple models selection, and visual feedback to enhance user experience. It can be installed via different methods like downloading the latest release, using Go, running with Docker, or building from source. Users can interact with Gollama through various options like specifying a custom base URL, prompt, model, and enabling raw output mode. The tool supports different modes like interactive, piped, CLI with image, and TUI with image. Gollama relies on third-party packages like bubbletea, glamour, huh, and lipgloss. The roadmap includes implementing piped mode, support for extracting codeblocks, copying responses/codeblocks to clipboard, GitHub Actions for automated releases, and downloading models directly from Ollama using the rest API. Contributions are welcome, and the project is licensed under the MIT License.
agentops
AgentOps is a toolkit for evaluating and developing robust and reliable AI agents. It provides benchmarks, observability, and replay analytics to help developers build better agents. AgentOps is open beta and can be signed up for here. Key features of AgentOps include: - Session replays in 3 lines of code: Initialize the AgentOps client and automatically get analytics on every LLM call. - Time travel debugging: (coming soon!) - Agent Arena: (coming soon!) - Callback handlers: AgentOps works seamlessly with applications built using Langchain and LlamaIndex.
auto-subs
Auto-subs is a tool designed to automatically transcribe editing timelines using OpenAI Whisper and Stable-TS for extreme accuracy. It generates subtitles in a custom style, is completely free, and runs locally within Davinci Resolve. It works on Mac, Linux, and Windows, supporting both Free and Studio versions of Resolve. Users can jump to positions on the timeline using the Subtitle Navigator and translate from any language to English. The tool provides a user-friendly interface for creating and customizing subtitles for video content.
agentscope
AgentScope is a multi-agent platform designed to empower developers to build multi-agent applications with large-scale models. It features three high-level capabilities: Easy-to-Use, High Robustness, and Actor-Based Distribution. AgentScope provides a list of `ModelWrapper` to support both local model services and third-party model APIs, including OpenAI API, DashScope API, Gemini API, and ollama. It also enables developers to rapidly deploy local model services using libraries such as ollama (CPU inference), Flask + Transformers, Flask + ModelScope, FastChat, and vllm. AgentScope supports various services, including Web Search, Data Query, Retrieval, Code Execution, File Operation, and Text Processing. Example applications include Conversation, Game, and Distribution. AgentScope is released under Apache License 2.0 and welcomes contributions.
GPTQModel
GPTQModel is an easy-to-use LLM quantization and inference toolkit based on the GPTQ algorithm. It provides support for weight-only quantization and offers features such as dynamic per layer/module flexible quantization, sharding support, and auto-heal quantization errors. The toolkit aims to ensure inference compatibility with HF Transformers, vLLM, and SGLang. It offers various model supports, faster quant inference, better quality quants, and security features like hash check of model weights. GPTQModel also focuses on faster quantization, improved quant quality as measured by PPL, and backports bug fixes from AutoGPTQ.
For similar tasks
rlhf_trojan_competition
This competition is organized by Javier Rando and Florian Tramèr from the ETH AI Center and SPY Lab at ETH Zurich. The goal of the competition is to create a method that can detect universal backdoors in aligned language models. A universal backdoor is a secret suffix that, when appended to any prompt, enables the model to answer harmful instructions. The competition provides a set of poisoned generation models, a reward model that measures how safe a completion is, and a dataset with prompts to run experiments. Participants are encouraged to use novel methods for red-teaming, automated approaches with low human oversight, and interpretability tools to find the trojans. The best submissions will be offered the chance to present their work at an event during the SaTML 2024 conference and may be invited to co-author a publication summarizing the competition results.
onnxruntime-server
ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference. It aims to offer simple, high-performance ML inference and a good developer experience. Users can provide inference APIs for ONNX models without writing additional code by placing the models in the directory structure. Each session can choose between CPU or CUDA, analyze input/output, and provide Swagger API documentation for easy testing. Ready-to-run Docker images are available, making it convenient to deploy the server.
hallucination-index
LLM Hallucination Index - RAG Special is a comprehensive evaluation of large language models (LLMs) focusing on context length and open vs. closed-source attributes. The index explores the impact of context length on model performance and tests the assumption that closed-source LLMs outperform open-source ones. It also investigates the effectiveness of prompting techniques like Chain-of-Note across different context lengths. The evaluation includes 22 models from various brands, analyzing major trends and declaring overall winners based on short, medium, and long context insights. Methodologies involve rigorous testing with different context lengths and prompting techniques to assess models' abilities in handling extensive texts and detecting hallucinations.
lumigator
Lumigator is an open-source platform developed by Mozilla.ai to help users select the most suitable language model for their specific needs. It supports the evaluation of summarization tasks using sequence-to-sequence models such as BART and BERT, as well as causal models like GPT and Mistral. The platform aims to make model selection transparent, efficient, and empowering by providing a framework for comparing LLMs using task-specific metrics to evaluate how well a model fits a project's needs. Lumigator is in the early stages of development and plans to expand support to additional machine learning tasks and use cases in the future.
ai-on-gke
This repository contains assets related to AI/ML workloads on Google Kubernetes Engine (GKE). Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. A robust AI/ML platform considers the following layers: Infrastructure orchestration that support GPUs and TPUs for training and serving workloads at scale Flexible integration with distributed computing and data processing frameworks Support for multiple teams on the same infrastructure to maximize utilization of resources
ray
Ray is a unified framework for scaling AI and Python applications. It consists of a core distributed runtime and a set of AI libraries for simplifying ML compute, including Data, Train, Tune, RLlib, and Serve. Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing ecosystem of community integrations. With Ray, you can seamlessly scale the same code from a laptop to a cluster, making it easy to meet the compute-intensive demands of modern ML workloads.
labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.
djl
Deep Java Library (DJL) is an open-source, high-level, engine-agnostic Java framework for deep learning. It is designed to be easy to get started with and simple to use for Java developers. DJL provides a native Java development experience and allows users to integrate machine learning and deep learning models with their Java applications. The framework is deep learning engine agnostic, enabling users to switch engines at any point for optimal performance. DJL's ergonomic API interface guides users with best practices to accomplish deep learning tasks, such as running inference and training neural networks.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.