BricksLLM

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

Stars: 953

Visit

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

README:

BricksLLM: AI Gateway For Putting LLMs In Production

[!TIP] A managed version of BricksLLM is also available! It is production ready, and comes with a dashboard to make interacting with BricksLLM easier. Try us out for free today!

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM:

Set LLM usage limits for users on different pricing tiers
Track LLM usage on a per user and per organization basis
Block or redact requests containing PIIs
Improve LLM reliability with failovers, retries and caching
Distribute API keys with rate limits and cost limits for internal development/production use cases
Distribute API keys with rate limits and cost limits for students

Features

[x] PII detection and masking
[x] Rate limit
[x] Cost control
[x] Cost analytics
[x] Request analytics
[x] Caching
[x] Request Retries
[x] Failover
[x] Model access control
[x] Endpoint access control
[x] Native support for all OpenAI endpoints
[x] Native support for Anthropic
[x] Native support for Azure OpenAI
[x] Native support for vLLM
[x] Native support for Deepinfra
[x] Support for custom deployments
[x] Integration with custom models
[x] Datadog integration
[x] Logging with privacy control

Getting Started

The easiest way to get started with BricksLLM is through BricksLLM-Docker.

Step 1 - Clone BricksLLM-Docker repository

git clone https://github.com/bricks-cloud/BricksLLM-Docker

Step 2 - Change to BricksLLM-Docker directory

cd BricksLLM-Docker

Step 3 - Deploy BricksLLM locally with Postgresql and Redis

docker compose up

You can run this in detach mode use the -d flag: docker compose up -d

Step 4 - Create a provider setting

curl -X PUT http://localhost:8001/api/provider-settings \
   -H "Content-Type: application/json" \
   -d '{
          "provider":"openai",
          "setting": {
             "apikey": "YOUR_OPENAI_KEY"
          }
      }'

Copy the id from the response.

Step 5 - Create a Bricks API key

Use id from the previous step as settingId to create a key with a rate limit of 2 req/min and a spend limit of 25 cents.

curl -X PUT http://localhost:8001/api/key-management/keys \
   -H "Content-Type: application/json" \
   -d '{
	      "name": "My Secret Key",
	      "key": "my-secret-key",
	      "tags": ["mykey"],
        "settingIds": ["ID_FROM_STEP_FOUR"],
        "rateLimitOverTime": 2,
        "rateLimitUnit": "m",
        "costLimitInUsd": 0.25
      }'

Congratulations you are done!!!

Then, just redirect your requests to us and use OpenAI as you would normally. For example:

curl -X POST http://localhost:8002/api/providers/openai/v1/chat/completions \
   -H "Authorization: Bearer my-secret-key" \
   -H "Content-Type: application/json" \
   -d '{
          "model": "gpt-3.5-turbo",
          "messages": [
              {
                  "role": "system",
                  "content": "hi"
              }
          ]
      }'

Or if you're using an SDK, you could change its baseURL to point to us. For example:

// OpenAI Node SDK v4
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: "some-secret-key", // key created earlier
  baseURL: "http://localhost:8002/api/providers/openai/v1", // redirect to us
});

How to Update?

For updating to the latest version

docker pull luyuanxin1995/bricksllm:latest

For updating to a particular version

docker pull luyuanxin1995/bricksllm:1.4.0

Documentation

Environment variables

Name type description default

POSTGRESQL_HOSTS required Hosts for Postgresql DB. Separated by , localhost

POSTGRESQL_DB_NAME optional Name for Postgresql DB.

POSTGRESQL_USERNAME required Postgresql DB username

POSTGRESQL_PASSWORD required Postgresql DB password

POSTGRESQL_SSL_MODE optional Postgresql SSL mode disable

POSTGRESQL_PORT optional The port that Postgresql DB runs on 5432

POSTGRESQL_READ_TIME_OUT optional Timeout for Postgresql read operations 2m

POSTGRESQL_WRITE_TIME_OUT optional Timeout for Postgresql write operations 5s

REDIS_HOSTS required Host for Redis. Separated by , localhost

REDIS_PASSWORD optional Redis Password

REDIS_PORT optional The port that Redis DB runs on 6379

REDIS_READ_TIME_OUT optional Timeout for Redis read operations 1s

REDIS_WRITE_TIME_OUT optional Timeout for Redis write operations 500ms

IN_MEMORY_DB_UPDATE_INTERVAL optional The interval BricksLLM API gateway polls Postgresql DB for latest key configurations 1s

STATS_PROVIDER optional "datadog" or Host:Port(127.0.0.1:8125) for statsd.

PROXY_TIMEOUT optional Timeout for proxy HTTP requests. 600s

NUMBER_OF_EVENT_MESSAGE_CONSUMERS optional Number of event message consumers that help handle counting tokens and inserting event into db. 3

AWS_SECRET_ACCESS_KEY optional It is for PII detection feature. 5s

AWS_ACCESS_KEY_ID optional It is for using PII detection feature. 5s

AMAZON_REGION optional Region for AWS. us-west-2

AMAZON_REQUEST_TIMEOUT optional Timeout for amazon requests. 5s

AMAZON_CONNECTION_TIMEOUT optional Timeout for amazon connection. 10s

ADMIN_PASS optional Simple password for the admin server.

Name	type	description	default
`POSTGRESQL_HOSTS`	required	Hosts for Postgresql DB. Separated by ,	`localhost`
`POSTGRESQL_DB_NAME`	optional	Name for Postgresql DB.
`POSTGRESQL_USERNAME`	required	Postgresql DB username
`POSTGRESQL_PASSWORD`	required	Postgresql DB password
`POSTGRESQL_SSL_MODE`	optional	Postgresql SSL mode	`disable`
`POSTGRESQL_PORT`	optional	The port that Postgresql DB runs on	`5432`
`POSTGRESQL_READ_TIME_OUT`	optional	Timeout for Postgresql read operations	`2m`
`POSTGRESQL_WRITE_TIME_OUT`	optional	Timeout for Postgresql write operations	`5s`
`REDIS_HOSTS`	required	Host for Redis. Separated by ,	`localhost`
`REDIS_PASSWORD`	optional	Redis Password
`REDIS_PORT`	optional	The port that Redis DB runs on	`6379`
`REDIS_READ_TIME_OUT`	optional	Timeout for Redis read operations	`1s`
`REDIS_WRITE_TIME_OUT`	optional	Timeout for Redis write operations	`500ms`
`IN_MEMORY_DB_UPDATE_INTERVAL`	optional	The interval BricksLLM API gateway polls Postgresql DB for latest key configurations	`1s`
`STATS_PROVIDER`	optional	"datadog" or Host:Port(127.0.0.1:8125) for statsd.
`PROXY_TIMEOUT`	optional	Timeout for proxy HTTP requests.	`600s`
`NUMBER_OF_EVENT_MESSAGE_CONSUMERS`	optional	Number of event message consumers that help handle counting tokens and inserting event into db.	`3`
`AWS_SECRET_ACCESS_KEY`	optional	It is for PII detection feature.	`5s`
`AWS_ACCESS_KEY_ID`	optional	It is for using PII detection feature.	`5s`
`AMAZON_REGION`	optional	Region for AWS.	`us-west-2`
`AMAZON_REQUEST_TIMEOUT`	optional	Timeout for amazon requests.	`5s`
`AMAZON_CONNECTION_TIMEOUT`	optional	Timeout for amazon connection.	`10s`
`ADMIN_PASS`	optional	Simple password for the admin server.

Admin Server

Swagger Doc

Proxy Server

Swagger Doc

For Tasks:

Click tags to check more tools for each tasks

create api keys set rate limits set cost limits track api usage

For Jobs:

data scientist machine learning engineer ai engineer software engineer devops engineer

Alternative AI tools for BricksLLM

Similar Open Source Tools

BricksLLM

github

: 953

ovos-installer

The ovos-installer is a simple and multilingual tool designed to install Open Voice OS and HiveMind using Bash, Whiptail, and Ansible. It supports various Linux distributions and provides an automated installation process. Users can easily start and stop services, update their Open Voice OS instance, and uninstall the tool if needed. The installer also allows for non-interactive installation through scenario files. It offers a user-friendly way to set up Open Voice OS on different systems.

github

: 138

PDFMathTranslate

PDFMathTranslate is a tool designed for translating scientific papers and conducting bilingual comparisons. It preserves formulas, charts, table of contents, and annotations. The tool supports multiple languages and diverse translation services. It provides a command-line tool, interactive user interface, and Docker deployment. Users can try the application through online demos. The tool offers various installation methods including command-line, portable, graphic user interface, and Docker. Advanced options allow users to customize translation settings. Additionally, the tool supports secondary development through APIs for Python and HTTP. Future plans include parsing layout with DocLayNet based models, fixing page rotation and format issues, supporting non-PDF/A files, and integrating plugins for Zotero and Obsidian.

github

: 19.2k

gollama

Gollama is a delightful tool that brings Ollama, your offline conversational AI companion, directly into your terminal. It provides a fun and interactive way to generate responses from various models without needing internet connectivity. Whether you're brainstorming ideas, exploring creative writing, or just looking for inspiration, Gollama is here to assist you. The tool offers an interactive interface, customizable prompts, multiple models selection, and visual feedback to enhance user experience. It can be installed via different methods like downloading the latest release, using Go, running with Docker, or building from source. Users can interact with Gollama through various options like specifying a custom base URL, prompt, model, and enabling raw output mode. The tool supports different modes like interactive, piped, CLI with image, and TUI with image. Gollama relies on third-party packages like bubbletea, glamour, huh, and lipgloss. The roadmap includes implementing piped mode, support for extracting codeblocks, copying responses/codeblocks to clipboard, GitHub Actions for automated releases, and downloading models directly from Ollama using the rest API. Contributions are welcome, and the project is licensed under the MIT License.

github

: 80

worker-vllm

The worker-vLLM repository provides a serverless endpoint for deploying OpenAI-compatible vLLM models with blazing-fast performance. It supports deploying various model architectures, such as Aquila, Baichuan, BLOOM, ChatGLM, Command-R, DBRX, DeciLM, Falcon, Gemma, GPT-2, GPT BigCode, GPT-J, GPT-NeoX, InternLM, Jais, LLaMA, MiniCPM, Mistral, Mixtral, MPT, OLMo, OPT, Orion, Phi, Phi-3, Qwen, Qwen2, Qwen2MoE, StableLM, Starcoder2, Xverse, and Yi. Users can deploy models using pre-built Docker images or build custom images with specified arguments. The repository also supports OpenAI compatibility for chat completions, completions, and models, with customizable input parameters. Users can modify their OpenAI codebase to use the deployed vLLM worker and access a list of available models for deployment.

github

: 300

pr-pilot

PR Pilot is an AI-powered tool designed to assist users in their daily workflow by delegating routine work to AI with confidence and predictability. It integrates seamlessly with popular development tools and allows users to interact with it through a Command-Line Interface, Python SDK, REST API, and Smart Workflows. Users can automate tasks such as generating PR titles and descriptions, summarizing and posting issues, and formatting README files. The tool aims to save time and enhance productivity by providing AI-powered solutions for common development tasks.

github

: 149

aicommit2

AICommit2 is a Reactive CLI tool that streamlines interactions with various AI providers such as OpenAI, Anthropic Claude, Gemini, Mistral AI, Cohere, and unofficial providers like Huggingface and Clova X. Users can request multiple AI simultaneously to generate git commit messages without waiting for all AI responses. The tool runs 'git diff' to grab code changes, sends them to configured AI, and returns the AI-generated commit message. Users can set API keys or Cookies for different providers and configure options like locale, generate number of messages, commit type, proxy, timeout, max-length, and more. AICommit2 can be used both locally with Ollama and remotely with supported providers, offering flexibility and efficiency in generating commit messages.

github

: 242

dingo

github

: 109

fittencode.nvim

Fitten Code AI Programming Assistant for Neovim provides fast completion using AI, asynchronous I/O, and support for various actions like document code, edit code, explain code, find bugs, generate unit test, implement features, optimize code, refactor code, start chat, and more. It offers features like accepting suggestions with Tab, accepting line with Ctrl + Down, accepting word with Ctrl + Right, undoing accepted text, automatic scrolling, and multiple HTTP/REST backends. It can run as a coc.nvim source or nvim-cmp source.

github

: 108

scrape-it-now

Scrape It Now is a versatile tool for scraping websites with features like decoupled architecture, CLI functionality, idempotent operations, and content storage options. The tool includes a scraper component for efficient scraping, ad blocking, link detection, markdown extraction, dynamic content loading, and anonymity features. It also offers an indexer component for creating AI search indexes, chunking content, embedding chunks, and enabling semantic search. The tool supports various configurations for Azure services and local storage, providing flexibility and scalability for web scraping and indexing tasks.

github

: 452

atlas-mcp-server

ATLAS (Adaptive Task & Logic Automation System) is a high-performance Model Context Protocol server designed for LLMs to manage complex task hierarchies. Built with TypeScript, it features ACID-compliant storage, efficient task tracking, and intelligent template management. ATLAS provides LLM Agents task management through a clean, flexible tool interface. The server implements the Model Context Protocol (MCP) for standardized communication between LLMs and external systems, offering hierarchical task organization, task state management, smart templates, enterprise features, and performance optimization.

github

: 112

rpaframework

RPA Framework is an open-source collection of libraries and tools for Robotic Process Automation (RPA), designed to be used with Robot Framework and Python. It offers well-documented core libraries for Software Robot Developers, optimized for Robocorp Control Room and Developer Tools, and accepts external contributions. The project includes various libraries for tasks like archiving, browser automation, date/time manipulations, cloud services integration, encryption operations, database interactions, desktop automation, document processing, email operations, Excel manipulation, file system operations, FTP interactions, web API interactions, image manipulation, AI services, and more. The development of the repository is Python-based and requires Python version 3.8+, with tooling based on poetry and invoke for compiling, building, and running the package. The project is licensed under the Apache License 2.0.

github

: 1.1k

vicinity

Vicinity is a lightweight, low-dependency vector store that provides a unified interface for nearest neighbor search with support for different backends and evaluation. It simplifies the process of comparing and evaluating different nearest neighbors packages by offering a simple and intuitive API. Users can easily experiment with various indexing methods and distance metrics to choose the best one for their use case. Vicinity also allows for measuring performance metrics like queries per second and recall.

github

: 244

mistral.rs

Mistral.rs is a fast LLM inference platform written in Rust. We support inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.

github

: 5.4k

StableToolBench

StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features such as a Virtual API System with caching and API simulators, a new set of solvable queries determined by LLMs, and a Stable Evaluation System using GPT-4. The Virtual API Server can be set up either by building from source or using a prebuilt Docker image. Users can test the server using provided scripts and evaluate models with Solvable Pass Rate and Solvable Win Rate metrics. The tool also includes model experiments results comparing different models' performance.

github

: 59

aikit

AIKit is a one-stop shop to quickly get started to host, deploy, build and fine-tune large language models (LLMs). AIKit offers two main capabilities: Inference: AIKit uses LocalAI, which supports a wide range of inference capabilities and formats. LocalAI provides a drop-in replacement REST API that is OpenAI API compatible, so you can use any OpenAI API compatible client, such as Kubectl AI, Chatbot-UI and many more, to send requests to open-source LLMs! Fine Tuning: AIKit offers an extensible fine tuning interface. It supports Unsloth for fast, memory efficient, and easy fine-tuning experience.

github

: 425

For similar tasks

BricksLLM

github

: 953

dify-plus

Dify-Plus is a project that extends and adds management center functionality to the original Dify project. It includes features such as user quota management, key quota settings, web page login authentication, and more. The project aims to address pain points in enterprise scenarios and is open for collaboration and discussion with the community.

github

: 804

openshield

OpenShield is a firewall designed for AI models to protect against various attacks such as prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency granting, overreliance, and model theft. It provides rate limiting, content filtering, and keyword filtering for AI models. The tool acts as a transparent proxy between AI models and clients, allowing users to set custom rate limits for OpenAI endpoints and perform tokenizer calculations for OpenAI models. OpenShield also supports Python and LLM based rules, with upcoming features including rate limiting per user and model, prompts manager, content filtering, keyword filtering based on LLM/Vector models, OpenMeter integration, and VectorDB integration. The tool requires an OpenAI API key, Postgres, and Redis for operation.

github

: 74

APIMyLlama

APIMyLlama is a server application that provides an interface to interact with the Ollama API, a powerful AI tool to run LLMs. It allows users to easily distribute API keys to create amazing things. The tool offers commands to generate, list, remove, add, change, activate, deactivate, and manage API keys, as well as functionalities to work with webhooks, set rate limits, and get detailed information about API keys. Users can install APIMyLlama packages with NPM, PIP, Jitpack Repo+Gradle or Maven, or from the Crates Repository. The tool supports Node.JS, Python, Java, and Rust for generating responses from the API. Additionally, it provides built-in health checking commands for monitoring API health status.

github

: 79

gpt-cli

gpt-cli is a command-line interface tool for interacting with various chat language models like ChatGPT, Claude, and others. It supports model customization, usage tracking, keyboard shortcuts, multi-line input, markdown support, predefined messages, and multiple assistants. Users can easily switch between different assistants, define custom assistants, and configure model parameters and API keys in a YAML file for easy customization and management.

github

: 580

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k