onnxruntime-server

ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.

Stars: 134

Visit

ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference. It aims to offer simple, high-performance ML inference and a good developer experience. Users can provide inference APIs for ONNX models without writing additional code by placing the models in the directory structure. Each session can choose between CPU or CUDA, analyze input/output, and provide Swagger API documentation for easy testing. Ready-to-run Docker images are available, making it convenient to deploy the server.

README:

ONNX Runtime Server

ONNX: Open Neural Network Exchange
The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
ONNX Runtime Server aims to provide simple, high-performance ML inference and a good developer experience.
- If you have exported ML models trained in various environments as ONNX files, you can provide inference APIs without writing additional code or metadata. Just place the ONNX files into the directory structure.
- Each ONNX session, you can choose to use CPU or CUDA.
- Analyze the input/output of ONNX models to provide type/shape information for your collaborators.
- Built-in Swagger API documentation makes it easy for collaborators to test ML models through the API. (API example)
- Ready-to-run Docker images. No build required.

Build ONNX Runtime Server
- Requirements
  - Install ONNX Runtime
  - Install dependencies
- Compile and Install
Install via a package manager
Run the server
Docker
API
How to use

Build ONNX Runtime Server

Requirements

ONNX Runtime
Boost
CMake, pkg-config
CUDA(optional, for Nvidia GPU support)
OpenSSL(optional, for HTTPS)

Install ONNX Runtime

Linux

Use download-onnxruntime-linux.sh script
- This script downloads the latest version of the binary and install to /usr/local/onnxruntime.
- Also, add /usr/local/onnxruntime/lib to /etc/ld.so.conf.d/onnxruntime.conf and run ldconfig.
Or manually download binary from ONNX Runtime Releases.

Mac OS

brew install onnxruntime

Install dependencies

Ubuntu/Debian

sudo apt install cmake pkg-config libboost-all-dev libssl-dev

(optional) CUDA support (CUDA 12.x, cuDNN 9.x)

Follow the instructions below to install the CUDA Toolkit and cuDNN.
- CUDA Toolkit Installation Guide
- CUDA Download for Ubuntu

sudo apt install cuda-toolkit-12 libcudnn9-dev-cuda-12
# optional, for Nvidia GPU support with Docker 
sudo apt install nvidia-container-toolkit

Mac OS

brew install cmake boost openssl

Compile and Install

cmake -B build -S . -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel
sudo cmake --install build --prefix /usr/local/onnxruntime-server

Install via a package manager

OS	Method	Command
Arch Linux	AUR	`yay -S onnxruntime-server`

Run the server

You must enter the path option(--model-dir) where the models are located.
- The onnx model files must be located in the following path: ${model_dir}/${model_name}/${model_version}/model.onnx or ${model_dir}/${model_name}/${model_version}.onnx

Files in `--model-dir`	Create session request body	Get/Execute session API URL path (after created)
`model_name/model_version/model.onnx` or `model_name/model_version.onnx`	`{"model":"model_name", "version":"model_version"}`	`/api/sessions/model_name/model_version`
`sample/v1/model.onnx` or `sample/v1.onnx`	`{"model":"sample", "version":"v1"}`	`/api/sessions/sample/v1`
`sample/v2/model.onnx` or `sample/v2.onnx`	`{"model":"sample", "version":"v2"}`	`/api/sessions/sample/v2`
`other/20200101/model.onnx` or `other/20200101.onnx`	`{"model":"other", "version":"20200101"}`	`/api/sessions/other/20200101`

You need to enable one of the following backends: TCP, HTTP, or HTTPS.
- If you want to use TCP, you must specify the --tcp-port option.
- If you want to use HTTP, you must specify the --http-port option.
- If you want to use HTTPS, you must specify the --https-port, --https-cert and --https-key options.
- If you want to use Swagger, you must specify the --swagger-url-path option.
Use the -h, --help option to see a full list of options.
All options can be set as environment variables. This can be useful when operating in a container like Docker.
- Normally, command-line options are prioritized over environment variables, but if the ONNX_SERVER_CONFIG_PRIORITY=env environment variable exists, environment variables have higher priority. Within a Docker image, environment variables have higher priority.

Options

Option	Environment	Description
`--workers`	`ONNX_SERVER_WORKERS`	Worker thread pool size. Default: `4`
`--request-payload-limit`	`ONNX_SERVER_REQUEST_PAYLOAD_LIMIT`	HTTP/HTTPS request payload size limit. Default: 1024 * 1024 * 10(10MB)`
`--model-dir`	`ONNX_SERVER_MODEL_DIR`	Model directory path The onnx model files must be located in the following path: `${model_dir}/${model_name}/${model_version}/model.onnx` or `${model_dir}/${model_name}/${model_version}.onnx` Default: `models`
`--prepare-model`	`ONNX_SERVER_PREPARE_MODEL`	Pre-create some model sessions at server startup. Format as a space-separated list of `model_name:model_version` or `model_name:model_version(session_options, ...)`. Available session_options are - cuda=device_id`[ or true or false]` eg) `model1:v1 model2:v9` `model1:v1(cuda=true) model2:v9(cuda=1)`

Backend options

Option	Environment	Description
`--tcp-port`	`ONNX_SERVER_TCP_PORT`	Enable TCP backend and which port number to use.
`--http-port`	`ONNX_SERVER_HTTP_PORT`	Enable HTTP backend and which port number to use.
`--https-port`	`ONNX_SERVER_HTTPS_PORT`	Enable HTTPS backend and which port number to use.
`--https-cert`	`ONNX_SERVER_HTTPS_CERT`	SSL Certification file path for HTTPS
`--https-key`	`ONNX_SERVER_HTTPS_KEY`	SSL Private key file path for HTTPS
`--swagger-url-path`	`ONNX_SERVER_SWAGGER_URL_PATH`	Enable Swagger API document for HTTP/HTTPS backend. This value cannot start with "/api/" and "/health" If not specified, swagger document not provided. eg) /swagger or /api-docs

Log options

Option	Environment	Description
`--log-level`	`ONNX_SERVER_LOG_LEVEL`	Log level(debug, info, warn, error, fatal)
`--log-file`	`ONNX_SERVER_LOG_FILE`	Log file path. If not specified, logs will be printed to stdout.
`--access-log-file`	`ONNX_SERVER_ACCESS_LOG_FILE`	Access log file path. If not specified, logs will be printed to stdout.

Docker

Docker hub: kibaes/onnxruntime-server
- 1.20.1-linux-cuda12 amd64(CUDA 12.x, cuDNN 9.x)
- 1.20.1-linux-cpu amd64, arm64

DOCKER_IMAGE=kibae/onnxruntime-server:1.20.1-linux-cuda12 # or kibae/onnxruntime-server:1.20.1-linux-cpu	

docker pull ${DOCKER_IMAGE}

# simple http backend
docker run --name onnxruntime_server_container -d --rm --gpus all \
  -p 80:80 \
  -v "/your_model_dir:/app/models" \
  -v "/your_log_dir:/app/logs" \
  -e "ONNX_SERVER_SWAGGER_URL_PATH=/api-docs" \
  ${DOCKER_IMAGE}

More information on using Docker images can be found here.
- https://hub.docker.com/r/kibaes/onnxruntime-server
docker-compose.yml example is available in the repository.

API

HTTP/HTTPS REST API
- API documentation (Swagger) is built in. If you want the server to serve swagger, add the --swagger-url-path=/swagger/ option at launch. This must be used with the --http-port or --https-port option.
```
./onnxruntime_server --model-dir=YOUR_MODEL_DIR --http-port=8080 --swagger-url-path=/api-docs/
```
  - After running the server as above, you will be able to access the Swagger UI available at http://localhost:8080/api-docs/.
- Swagger Sample
TCP API

How to use

A few things have been left out to help you get a rough idea of the usage flow.

Simple usage examples

Example of creating ONNX sessions at server startup

%%{init: {
    'sequence': {'noteAlign': 'left', 'mirrorActors': true}
}}%%
sequenceDiagram
    actor A as Administrator
    box rgb(0, 0, 0, 0.1) "ONNX Runtime Server"
        participant SD as Disk
        participant SP as Process
    end
    actor C as Client
    Note right of A: You have 3 models to serve.
    A ->> SD: copy model files to disk.<br />"/var/models/model_A/v1/model.onnx"<br />"/var/models/model_A/v2/model.onnx"<br />"/var/models/model_B/20201101/model.onnx"
    A ->> SP: Start server with --prepare-model option
    activate SP
    Note right of A: onnxruntime_server<br />--http-port=8080<br />--model-path=/var/models<br />--prepare-model="model_A:v1(cuda=0) model_A:v2(cuda=0)"
    SP -->> SD: Load model
    Note over SD, SP: Load model from<br />"/var/models/model_A/v1/model.onnx"
    SD -->> SP: Model binary
    activate SP
    SP -->> SP: Create<br />onnxruntime<br />session
    deactivate SP
    deactivate SP
    rect rgb(100, 100, 100, 0.3)
        Note over SD, C: Execute Session
        C ->> SP: Execute session request
        activate SP
        Note over SP, C: POST /api/sessions/model_A/v1<br />{<br />"x": [[1], [2], [3]],<br />"y": [[2], [3], [4]],<br />"z": [[3], [4], [5]]<br />}
        activate SP
        SP -->> SP: Execute<br />onnxruntime<br />session
        deactivate SP
        SP ->> C: Execute session response
        deactivate SP
        Note over SP, C: {<br />"output": [<br />[0.6492120623588562],<br />[0.7610487341880798],<br />[0.8728854656219482]<br />]<br />}
    end

Example of the client creating and running ONNX sessions

%%{init: {
    'sequence': {'noteAlign': 'left', 'mirrorActors': true}
}}%%
sequenceDiagram
    actor A as Administrator
    box rgb(0, 0, 0, 0.1) "ONNX Runtime Server"
        participant SD as Disk
        participant SP as Process
    end
    actor C as Client
    Note right of A: You have 3 models to serve.
    A ->> SD: copy model files to disk.<br />"/var/models/model_A/v1/model.onnx"<br />"/var/models/model_A/v2/model.onnx"<br />"/var/models/model_B/20201101/model.onnx"
    A ->> SP: Start server
    Note right of A: onnxruntime_server<br />--http-port=8080<br />--model-path=/var/models
    rect rgb(100, 100, 100, 0.3)
        Note over SD, C: Create Session
        C ->> SP: Create session request
        activate SP
        Note over SP, C: POST /api/sessions<br />{"model": "model_A", "version": "v1"}
        SP -->> SD: Load model
        Note over SD, SP: Load model from<br />"/var/models/model_A/v1/model.onnx"
        SD -->> SP: Model binary
        activate SP
        SP -->> SP: Create<br />onnxruntime<br />session
        deactivate SP
        SP ->> C: Create session response
        deactivate SP
        Note over SP, C: {<br />"model": "model_A",<br />"version": "v1",<br />"created_at": 1694228106,<br />"execution_count": 0,<br />"last_executed_at": 0,<br />"inputs": {<br />"x": "float32[-1,1]",<br />"y": "float32[-1,1]",<br />"z": "float32[-1,1]"<br />},<br />"outputs": {<br />"output": "float32[-1,1]"<br />}<br />}
        Note right of C: 👌 You can know the type and shape<br />of the input and output.
    end
    rect rgb(100, 100, 100, 0.3)
        Note over SD, C: Execute Session
        C ->> SP: Execute session request
        activate SP
        Note over SP, C: POST /api/sessions/model_A/v1<br />{<br />"x": [[1], [2], [3]],<br />"y": [[2], [3], [4]],<br />"z": [[3], [4], [5]]<br />}
        activate SP
        SP -->> SP: Execute<br />onnxruntime<br />session
        deactivate SP
        SP ->> C: Execute session response
        deactivate SP
        Note over SP, C: {<br />"output": [<br />[0.6492120623588562],<br />[0.7610487341880798],<br />[0.8728854656219482]<br />]<br />}
    end

For Tasks:

Click tags to check more tools for each tasks

deploy models analyze models test models provide apis run server

For Jobs:

machine learning engineer data scientist ai engineer software developer devops engineer

Alternative AI tools for onnxruntime-server

Similar Open Source Tools

onnxruntime-server

github

: 134

gpt-home

GPT Home is a project that allows users to build their own home assistant using Raspberry Pi and OpenAI API. It serves as a guide for setting up a smart home assistant similar to Google Nest Hub or Amazon Alexa. The project integrates various components like OpenAI, Spotify, Philips Hue, and OpenWeatherMap to provide a personalized home assistant experience. Users can follow the detailed instructions provided to build their own version of the home assistant on Raspberry Pi, with optional components for customization. The project also includes system configurations, dependencies installation, and setup scripts for easy deployment. Overall, GPT Home offers a DIY solution for creating a smart home assistant using Raspberry Pi and OpenAI technology.

github

: 513

Ling

Ling is a MoE LLM provided and open-sourced by InclusionAI. It includes two different sizes, Ling-Lite with 16.8 billion parameters and Ling-Plus with 290 billion parameters. These models show impressive performance and scalability for various tasks, from natural language processing to complex problem-solving. The open-source nature of Ling encourages collaboration and innovation within the AI community, leading to rapid advancements and improvements. Users can download the models from Hugging Face and ModelScope for different use cases. Ling also supports offline batched inference and online API services for deployment. Additionally, users can fine-tune Ling models using Llama-Factory for tasks like SFT and DPO.

github

: 119

MHA2MLA

This repository contains the code for the paper 'Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs'. It provides tools for fine-tuning and evaluating Llama models, converting models between different frameworks, processing datasets, and performing specific model training tasks like Partial-RoPE Fine-Tuning and Multiple-Head Latent Attention Fine-Tuning. The repository also includes commands for model evaluation using Lighteval and LongBench, along with necessary environment setup instructions.

github

: 145

Noi

Noi is an AI-enhanced customizable browser designed to streamline digital experiences. It includes curated AI websites, allows adding any URL, offers prompts management, Noi Ask for batch messaging, various themes, Noi Cache Mode for quick link access, cookie data isolation, and more. Users can explore, extend, and empower their browsing experience with Noi.

github

: 6.0k

readme-ai

README-AI is a developer tool that auto-generates README.md files using a combination of data extraction and generative AI. It streamlines documentation creation and maintenance, enhancing developer productivity. This project aims to enable all skill levels, across all domains, to better understand, use, and contribute to open-source software. It offers flexible README generation, supports multiple large language models (LLMs), provides customizable output options, works with various programming languages and project types, and includes an offline mode for generating boilerplate README files without external API calls.

github

: 1.5k

evalplus

EvalPlus is a rigorous evaluation framework for LLM4Code, providing HumanEval+ and MBPP+ tests to evaluate large language models on code generation tasks. It offers precise evaluation and ranking, coding rigorousness analysis, and pre-generated code samples. Users can use EvalPlus to generate code solutions, post-process code, and evaluate code quality. The tool includes tools for code generation and test input generation using various backends.

github

: 1.3k

evalscope

Eval-Scope is a framework designed to support the evaluation of large language models (LLMs) by providing pre-configured benchmark datasets, common evaluation metrics, model integration, automatic evaluation for objective questions, complex task evaluation using expert models, reports generation, visualization tools, and model inference performance evaluation. It is lightweight, easy to customize, supports new dataset integration, model hosting on ModelScope, deployment of locally hosted models, and rich evaluation metrics. Eval-Scope also supports various evaluation modes like single mode, pairwise-baseline mode, and pairwise (all) mode, making it suitable for assessing and improving LLMs.

github

: 692

vnc-lm

vnc-lm is a Discord bot designed for messaging with language models. Users can configure model parameters, branch conversations, and edit prompts to enhance responses. The bot supports various providers like OpenAI, Huggingface, and Cloudflare Workers AI. It integrates with ollama and LiteLLM, allowing users to access a wide range of language model APIs through a single interface. Users can manage models, switch between models, split long messages, and create conversation branches. LiteLLM integration enables support for OpenAI-compatible APIs and local LLM services. The bot requires Docker for installation and can be configured through environment variables. Troubleshooting tips are provided for common issues like context window problems, Discord API errors, and LiteLLM issues.

github

: 65

obsei

Obsei is an open-source, low-code, AI powered automation tool that consists of an Observer to collect unstructured data from various sources, an Analyzer to analyze the collected data with various AI tasks, and an Informer to send analyzed data to various destinations. The tool is suitable for scheduled jobs or serverless applications as all Observers can store their state in databases. Obsei is still in alpha stage, so caution is advised when using it in production. The tool can be used for social listening, alerting/notification, automatic customer issue creation, extraction of deeper insights from feedbacks, market research, dataset creation for various AI tasks, and more based on creativity.

github

: 1.2k

wzry_ai

This is an open-source project for playing the game King of Glory with an artificial intelligence model. The first phase of the project has been completed, and future upgrades will be built upon this foundation. The second phase of the project has started, and progress is expected to proceed according to plan. For any questions, feel free to join the QQ exchange group: 687853827. The project aims to learn artificial intelligence and strictly prohibits cheating. Detailed installation instructions are available in the doc/README.md file. Environment installation video: (bilibili) Welcome to follow, like, tip, comment, and provide your suggestions.

github

: 63

dvc

DVC, or Data Version Control, is a command-line tool and VS Code extension that helps you develop reproducible machine learning projects. With DVC, you can version your data and models, iterate fast with lightweight pipelines, track experiments in your local Git repo, compare any data, code, parameters, model, or performance plots, and share experiments and automatically reproduce anyone's experiment.

github

: 13.6k

TempCompass

TempCompass is a benchmark designed to evaluate the temporal perception ability of Video LLMs. It encompasses a diverse set of temporal aspects and task formats to comprehensively assess the capability of Video LLMs in understanding videos. The benchmark includes conflicting videos to prevent models from relying on single-frame bias and language priors. Users can clone the repository, install required packages, prepare data, run inference using examples like Video-LLaVA and Gemini, and evaluate the performance of their models across different tasks such as Multi-Choice QA, Yes/No QA, Caption Matching, and Caption Generation.

github

: 71

gollama

Gollama is a delightful tool that brings Ollama, your offline conversational AI companion, directly into your terminal. It provides a fun and interactive way to generate responses from various models without needing internet connectivity. Whether you're brainstorming ideas, exploring creative writing, or just looking for inspiration, Gollama is here to assist you. The tool offers an interactive interface, customizable prompts, multiple models selection, and visual feedback to enhance user experience. It can be installed via different methods like downloading the latest release, using Go, running with Docker, or building from source. Users can interact with Gollama through various options like specifying a custom base URL, prompt, model, and enabling raw output mode. The tool supports different modes like interactive, piped, CLI with image, and TUI with image. Gollama relies on third-party packages like bubbletea, glamour, huh, and lipgloss. The roadmap includes implementing piped mode, support for extracting codeblocks, copying responses/codeblocks to clipboard, GitHub Actions for automated releases, and downloading models directly from Ollama using the rest API. Contributions are welcome, and the project is licensed under the MIT License.

github

: 80

evalchemy

Evalchemy is a unified and easy-to-use toolkit for evaluating language models, focusing on post-trained models. It integrates multiple existing benchmarks such as RepoBench, AlpacaEval, and ZeroEval. Key features include unified installation, parallel evaluation, simplified usage, and results management. Users can run various benchmarks with a consistent command-line interface and track results locally or integrate with a database for systematic tracking and leaderboard submission.

github

: 317

GPTQModel

GPTQModel is an easy-to-use LLM quantization and inference toolkit based on the GPTQ algorithm. It provides support for weight-only quantization and offers features such as dynamic per layer/module flexible quantization, sharding support, and auto-heal quantization errors. The toolkit aims to ensure inference compatibility with HF Transformers, vLLM, and SGLang. It offers various model supports, faster quant inference, better quality quants, and security features like hash check of model weights. GPTQModel also focuses on faster quantization, improved quant quality as measured by PPL, and backports bug fixes from AutoGPTQ.

github

: 422

For similar tasks

rlhf_trojan_competition

This competition is organized by Javier Rando and Florian Tramèr from the ETH AI Center and SPY Lab at ETH Zurich. The goal of the competition is to create a method that can detect universal backdoors in aligned language models. A universal backdoor is a secret suffix that, when appended to any prompt, enables the model to answer harmful instructions. The competition provides a set of poisoned generation models, a reward model that measures how safe a completion is, and a dataset with prompts to run experiments. Participants are encouraged to use novel methods for red-teaming, automated approaches with low human oversight, and interpretability tools to find the trojans. The best submissions will be offered the chance to present their work at an event during the SaTML 2024 conference and may be invited to co-author a publication summarizing the competition results.

github

: 93

onnxruntime-server

github

: 134

hallucination-index

LLM Hallucination Index - RAG Special is a comprehensive evaluation of large language models (LLMs) focusing on context length and open vs. closed-source attributes. The index explores the impact of context length on model performance and tests the assumption that closed-source LLMs outperform open-source ones. It also investigates the effectiveness of prompting techniques like Chain-of-Note across different context lengths. The evaluation includes 22 models from various brands, analyzing major trends and declaring overall winners based on short, medium, and long context insights. Methodologies involve rigorous testing with different context lengths and prompting techniques to assess models' abilities in handling extensive texts and detecting hallucinations.

github

: 54

lumigator

Lumigator is an open-source platform developed by Mozilla.ai to help users select the most suitable language model for their specific needs. It supports the evaluation of summarization tasks using sequence-to-sequence models such as BART and BERT, as well as causal models like GPT and Mistral. The platform aims to make model selection transparent, efficient, and empowering by providing a framework for comparing LLMs using task-specific metrics to evaluate how well a model fits a project's needs. Lumigator is in the early stages of development and plans to expand support to additional machine learning tasks and use cases in the future.

github

: 194

A-Survey-on-Mixture-of-Experts-in-LLMs

A curated collection of papers and resources on Mixture of Experts in Large Language Models. The repository provides a chronological overview of several representative Mixture-of-Experts (MoE) models in recent years, structured according to release dates. It covers MoE models from various domains like Natural Language Processing (NLP), Computer Vision, Multimodal, and Recommender Systems. The repository aims to offer insights into Inference Optimization Techniques, Sparsity exploration, Attention mechanisms, and safety enhancements in MoE models.

github

: 203

tool-ahead-of-time

Tool-Ahead-of-Time (TAoT) is a Python package that enables tool calling for any model available through Langchain's ChatOpenAI library, even before official support is provided. It reformats model output into a JSON parser for tool calling. The package supports OpenAI and non-OpenAI models, following LangChain's syntax for tool calling. Users can start using the tool without waiting for official support, providing a more robust solution for tool calling.

github

: 91

ai-on-gke

This repository contains assets related to AI/ML workloads on Google Kubernetes Engine (GKE). Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. A robust AI/ML platform considers the following layers: Infrastructure orchestration that support GPUs and TPUs for training and serving workloads at scale Flexible integration with distributed computing and data processing frameworks Support for multiple teams on the same infrastructure to maximize utilization of resources

github

: 280

ray

Ray is a unified framework for scaling AI and Python applications. It consists of a core distributed runtime and a set of AI libraries for simplifying ML compute, including Data, Train, Tune, RLlib, and Serve. Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing ecosystem of community integrations. With Ray, you can seamlessly scale the same code from a laptop to a cluster, making it easy to meet the compute-intensive demands of modern ML workloads.

github

: 36.4k

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k