
tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.
Stars: 10300

TensorZero is an open-source platform that helps LLM applications graduate from API wrappers into defensible AI products. It enables a data & learning flywheel for LLMs by unifying inference, observability, optimization, and experimentation. The platform includes a high-performance model gateway, structured schema-based inference, observability, experimentation, and data warehouse for analytics. TensorZero Recipes optimize prompts and models, and the platform supports experimentation features and GitOps orchestration for deployment.
README:
TensorZero is an open-source stack for industrial-grade LLM applications:
- Gateway: access every LLM provider through a unified API, built for performance (<1ms p99 latency)
- Observability: store inferences and feedback in your database, available programmatically or in the UI
- Optimization: collect metrics and human feedback to optimize prompts, models, and inference strategies
- Evaluation: benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, etc.
- Experimentation: ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.
Take what you need, adopt incrementally, and complement with other tools.
Website
·
Docs
·
Twitter
·
Slack
·
Discord
Quick Start (5min)
·
Deployment Guide
·
API Reference
·
Configuration Reference
What is TensorZero? | TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation. |
How is TensorZero different from other LLM frameworks? |
1. TensorZero enables you to optimize complex LLM applications based on production metrics and human feedback. 2. TensorZero supports the needs of industrial-grade LLM applications: low latency, high throughput, type safety, self-hosted, GitOps, customizability, etc. 3. TensorZero unifies the entire LLMOps stack, creating compounding benefits. For example, LLM evaluations can be used for fine-tuning models alongside AI judges. |
Can I use TensorZero with ___? | Yes. Every major programming language is supported. You can use TensorZero with our Python client, any OpenAI SDK or OpenAI-compatible client, or our HTTP API. |
Is TensorZero production-ready? | Yes. Here's a case study: Automating Code Changelogs at a Large Bank with LLMs |
How much does TensorZero cost? | Nothing. TensorZero is 100% self-hosted and open-source. There are no paid features. |
Who is building TensorZero? | Our technical team includes a former Rust compiler maintainer, machine learning researchers (Stanford, CMU, Oxford, Columbia) with thousands of citations, and the chief product officer of a decacorn startup. We're backed by the same investors as leading open-source projects (e.g. ClickHouse, CockroachDB) and AI labs (e.g. OpenAI, Anthropic). See our $7.3M seed round announcement and coverage from VentureBeat. We're hiring in NYC. |
How do I get started? | You can adopt TensorZero incrementally. Our Quick Start goes from a vanilla OpenAI wrapper to a production-ready LLM application with observability and fine-tuning in just 5 minutes. |
Integrate with TensorZero once and access every major LLM provider.
- [x] Access every major LLM provider (API or self-hosted) through a single unified API
- [x] Infer with streaming, tool use, structured generation (JSON mode), batch, embeddings, multimodal (VLMs), file inputs, caching, etc.
- [x] Define prompt templates and schemas to enforce a consistent, typed interface between your application and the LLMs
- [x] Satisfy extreme throughput and latency needs, thanks to 🦀 Rust: <1ms p99 latency overhead at 10k+ QPS
- [x] Integrate using our Python client, any OpenAI SDK or OpenAI-compatible client, or our HTTP API (use any programming language)
- [x] Ensure high availability with routing, retries, fallbacks, load balancing, granular timeouts, etc.
- [ ] Soon: rate limits, spend tracking and budgeting, service accounts
Model Providers | Features |
The TensorZero Gateway natively supports:
Need something else? Your provider is most likely supported because TensorZero integrates with any OpenAI-compatible API (e.g. Ollama). |
The TensorZero Gateway supports advanced features like:
The TensorZero Gateway is written in Rust 🦀 with performance in mind (<1ms p99 latency overhead @ 10k QPS).
See Benchmarks. You can run inference using the TensorZero client (recommended), the OpenAI client, or the HTTP API. |
Usage: Python — TensorZero Client (Recommended)
You can access any provider using the TensorZero Python client.
pip install tensorzero
- Optional: Set up the TensorZero configuration.
- Run inference:
from tensorzero import TensorZeroGateway # or AsyncTensorZeroGateway
with TensorZeroGateway.build_embedded(clickhouse_url="...", config_file="...") as client:
response = client.inference(
model_name="openai::gpt-4o-mini",
# Try other providers easily: "anthropic::claude-3-7-sonnet-20250219"
input={
"messages": [
{
"role": "user",
"content": "Write a haiku about artificial intelligence.",
}
]
},
)
See Quick Start for more information.
Usage: Python — OpenAI Client
You can access any provider using the OpenAI Python client with TensorZero.
pip install tensorzero
- Optional: Set up the TensorZero configuration.
- Run inference:
from openai import OpenAI # or AsyncOpenAI
from tensorzero import patch_openai_client
client = OpenAI()
patch_openai_client(
client,
clickhouse_url="http://chuser:chpassword@localhost:8123/tensorzero",
config_file="config/tensorzero.toml",
async_setup=False,
)
response = client.chat.completions.create(
model="tensorzero::model_name::openai::gpt-4o-mini",
# Try other providers easily: "tensorzero::model_name::anthropic::claude-3-7-sonnet-20250219"
messages=[
{
"role": "user",
"content": "Write a haiku about artificial intelligence.",
}
],
)
See Quick Start for more information.
Usage: JavaScript / TypeScript (Node) — OpenAI Client
You can access any provider using the OpenAI Node client with TensorZero.
- Deploy
tensorzero/gateway
using Docker. Detailed instructions → - Set up the TensorZero configuration.
- Run inference:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:3000/openai/v1",
});
const response = await client.chat.completions.create({
model: "tensorzero::model_name::openai::gpt-4o-mini",
// Try other providers easily: "tensorzero::model_name::anthropic::claude-3-7-sonnet-20250219"
messages: [
{
role: "user",
content: "Write a haiku about artificial intelligence.",
},
],
});
See Quick Start for more information.
Usage: Other Languages & Platforms — HTTP API
TensorZero supports virtually any programming language or platform via its HTTP API.
- Deploy
tensorzero/gateway
using Docker. Detailed instructions → - Optional: Set up the TensorZero configuration.
- Run inference:
curl -X POST "http://localhost:3000/inference" \
-H "Content-Type: application/json" \
-d '{
"model_name": "openai::gpt-4o-mini",
"input": {
"messages": [
{
"role": "user",
"content": "Write a haiku about artificial intelligence."
}
]
}
}'
See Quick Start for more information.
Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time — all using the open-source TensorZero UI.
- [x] Store inferences and feedback (metrics, human edits, etc.) in your own database
- [x] Dive into individual inferences or high-level aggregate patterns using the TensorZero UI or programmatically
- [x] Build datasets for optimization, evaluation, and other workflows
- [x] Replay historical inferences with new prompts, models, inference strategies, etc.
- [x] Export OpenTelemetry (OTLP) traces to your favorite general-purpose observability tool
- [ ] Soon: AI-assisted debugging and root cause analysis; AI-assisted data labeling
Observability » UI | Observability » Programmatic |
t0.experimental_list_inferences(
function_name="sales_agent",
variant_name="qwen3-promptv2",
filters=BooleanMetricFilter(
metric_name="converted_sale",
value=True,
),
order_by=[OrderBy(by="timestamp", direction="DESC")],
limit=100_000,
# ... and more ...
) |
Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies — using the UI or programmatically.
- [x] Optimize your models with supervised fine-tuning, RLHF, and other techniques
- [x] Optimize your prompts with automated prompt engineering algorithms like MIPROv2
- [x] Optimize your inference strategy with dynamic in-context learning, chain of thought, best/mixture-of-N sampling, etc.
- [x] Enable a feedback loop for your LLMs: a data & learning flywheel turning production data into smarter, faster, and cheaper models
- [ ] Soon: synthetic data generation
Optimize closed-source and open-source models using supervised fine-tuning (SFT) and preference fine-tuning (DPO).
Supervised Fine-tuning — UI | Preference Fine-tuning (DPO) — Jupyter Notebook |
Boost performance by dynamically updating your prompts with relevant examples, combining responses from multiple inferences, and more.
Best-of-N Sampling | Mixture-of-N Sampling |
Dynamic In-Context Learning (DICL) | Chain-of-Thought (CoT) |
More coming soon...
Optimize your prompts programmatically using research-driven optimization techniques.
MIPROv2 | DSPy Integration |
TensorZero comes with several optimization recipes, but you can also easily create your own. This example shows how to optimize a TensorZero function using an arbitrary tool — here, DSPy, a popular library for automated prompt engineering. |
More coming soon...
Compare prompts, models, and inference strategies using evaluations powered by heuristics and LLM judges.
- [x] Evaluate individual inferences with static evaluations powered by heuristics or LLM judges (≈ unit tests for LLMs)
- [x] Evaluate end-to-end workflows with dynamic evaluations with complete flexibility (≈ integration tests for LLMs)
- [x] Optimize LLM judges just like any other TensorZero function to align them to human preferences
- [ ] Soon: more built-in evaluators; headless evaluations
Ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.
- [x] Ship with confidence with built-in A/B testing for models, prompts, providers, hyperparameters, etc.
- [x] Enforce principled experiments (RCTs) in complex workflows, including multi-turn and compound LLM systems
- [ ] Soon: multi-armed bandits; AI-managed experiments
Build with an open-source stack well-suited for prototypes but designed from the ground up to support the most complex LLM applications and deployments.
- [x] Build simple applications or massive deployments with GitOps-friendly orchestration
- [x] Extend TensorZero with built-in escape hatches, programmatic-first usage, direct database access, and more
- [x] Integrate with third-party tools: specialized observability and evaluations, model providers, agent orchestration frameworks, etc.
- [x] Iterate quickly by experimenting with prompts interactively using the Playground UI
Watch LLMs get better at data extraction in real-time with TensorZero!
Dynamic in-context learning (DICL) is a powerful inference-time optimization available out of the box with TensorZero. It enhances LLM performance by automatically incorporating relevant historical examples into the prompt, without the need for model fine-tuning.
https://github.com/user-attachments/assets/4df1022e-886e-48c2-8f79-6af3cdad79cb
Start building today. The Quick Start shows it's easy to set up an LLM application with TensorZero.
Questions? Ask us on Slack or Discord.
Using TensorZero at work? Email us at [email protected] to set up a Slack or Teams channel with your team (free).
We are working on a series of complete runnable examples illustrating TensorZero's data & learning flywheel.
Optimizing Data Extraction (NER) with TensorZero
This example shows how to use TensorZero to optimize a data extraction pipeline. We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL). In the end, an optimized GPT-4o Mini model outperforms GPT-4o on this task — at a fraction of the cost and latency — using a small amount of training data.
Agentic RAG — Multi-Hop Question Answering with LLMs
This example shows how to build a multi-hop retrieval agent using TensorZero. The agent iteratively searches Wikipedia to gather information, and decides when it has enough context to answer a complex question.
Writing Haikus to Satisfy a Judge with Hidden Preferences
This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste. You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants. You'll see progress by fine-tuning the LLM multiple times.
Image Data Extraction — Multimodal (Vision) Fine-tuning
This example shows how to fine-tune multimodal models (VLMs) like GPT-4o to improve their performance on vision-language tasks. Specifically, we'll build a system that categorizes document images (screenshots of computer science research papers).
Improving LLM Chess Ability with Best-of-N Sampling
This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.
Improving Math Reasoning with a Custom Recipe for Automated Prompt Engineering (DSPy)
TensorZero provides a number of pre-built optimization recipes covering common LLM engineering workflows. But you can also easily create your own recipes and workflows! This example shows how to optimize a TensorZero function using an arbitrary tool — here, DSPy.
& many more on the way!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for tensorzero
Similar Open Source Tools

tensorzero
TensorZero is an open-source platform that helps LLM applications graduate from API wrappers into defensible AI products. It enables a data & learning flywheel for LLMs by unifying inference, observability, optimization, and experimentation. The platform includes a high-performance model gateway, structured schema-based inference, observability, experimentation, and data warehouse for analytics. TensorZero Recipes optimize prompts and models, and the platform supports experimentation features and GitOps orchestration for deployment.

abi
ABI (Agentic Brain Infrastructure) is a Python-based AI Operating System designed to serve as the core infrastructure for building an Agentic AI Ontology Engine. It empowers organizations to integrate, manage, and scale AI-driven operations with multiple AI models, focusing on ontology, agent-driven workflows, and analytics. ABI emphasizes modularity and customization, providing a customizable framework aligned with international standards and regulatory frameworks. It offers features such as configurable AI agents, ontology management, integrations with external data sources, data processing pipelines, workflow automation, analytics, and data handling capabilities.

ComfyUI-Copilot
ComfyUI-Copilot is an intelligent assistant built on the Comfy-UI framework that simplifies and enhances the AI algorithm debugging and deployment process through natural language interactions. It offers intuitive node recommendations, workflow building aids, and model querying services to streamline development processes. With features like interactive Q&A bot, natural language node suggestions, smart workflow assistance, and model querying, ComfyUI-Copilot aims to lower the barriers to entry for beginners, boost development efficiency with AI-driven suggestions, and provide real-time assistance for developers.

lighteval
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron. We're releasing it with the community in the spirit of building in the open. Note that it is still very much early so don't expect 100% stability ^^' In case of problems or question, feel free to open an issue!

dspy.rb
DSPy.rb is a Ruby framework for building reliable LLM applications using composable, type-safe modules. It enables developers to define typed signatures and compose them into pipelines, offering a more structured approach compared to traditional prompting. The framework embraces Ruby conventions and adds innovations like CodeAct agents and enhanced production instrumentation, resulting in scalable LLM applications that are robust and efficient. DSPy.rb is actively developed, with a focus on stability and real-world feedback through the 0.x series before reaching a stable v1.0 API.

LynxHub
LynxHub is a platform that allows users to seamlessly install, configure, launch, and manage all their AI interfaces from a single, intuitive dashboard. It offers features like AI interface management, arguments manager, custom run commands, pre-launch actions, extension management, in-app tools like terminal and web browser, AI information dashboard, Discord integration, and additional features like theme options and favorite interface pinning. The platform supports modular design for custom AI modules and upcoming extensions system for complete customization. LynxHub aims to streamline AI workflow and enhance user experience with a user-friendly interface and comprehensive functionalities.

transformerlab-app
Transformer Lab is an app that allows users to experiment with Large Language Models by providing features such as one-click download of popular models, finetuning across different hardware, RLHF and Preference Optimization, working with LLMs across different operating systems, chatting with models, using different inference engines, evaluating models, building datasets for training, calculating embeddings, providing a full REST API, running in the cloud, converting models across platforms, supporting plugins, embedded Monaco code editor, prompt editing, inference logs, all through a simple cross-platform GUI.

PageTalk
PageTalk is a browser extension that enhances web browsing by integrating Google's Gemini API. It allows users to select text on any webpage for AI analysis, translation, contextual chat, and customization. The tool supports multi-agent system, image input, rich content rendering, PDF parsing, URL context extraction, personalized settings, chat export, text selection helper, and proxy support. Users can interact with web pages, chat contextually, manage AI agents, and perform various tasks seamlessly.

chatbox
Chatbox is a desktop client for ChatGPT, Claude, and other LLMs, providing features like local data storage, multiple LLM provider support, image generation, enhanced prompting, keyboard shortcuts, and more. It offers a user-friendly interface with dark theme, team collaboration, cross-platform availability, web version access, iOS & Android apps, multilingual support, and ongoing feature enhancements. Developed for prompt and API debugging, it has gained popularity for daily chatting and professional role-playing with AI assistance.

payload-ai
The Payload AI Plugin is an advanced extension that integrates modern AI capabilities into your Payload CMS, streamlining content creation and management. It offers features like text generation, voice and image generation, field-level prompt customization, prompt editor, document analyzer, fact checking, automated content workflows, internationalization support, editor AI suggestions, and AI chat support. Users can personalize and configure the plugin by setting environment variables. The plugin is actively developed and tested with Payload version v3.2.1, with regular updates expected.

cog
Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container. You can deploy your packaged model to your own infrastructure, or to Replicate.

summarize
The 'summarize' tool is designed to transcribe and summarize videos from various sources using AI models. It helps users efficiently summarize lengthy videos, take notes, and extract key insights by providing timestamps, original transcripts, and support for auto-generated captions. Users can utilize different AI models via Groq, OpenAI, or custom local models to generate grammatically correct video transcripts and extract wisdom from video content. The tool simplifies the process of summarizing video content, making it easier to remember and reference important information.

rigging
Rigging is a lightweight LLM framework designed to simplify the usage of language models in production code. It offers structured Pydantic models for text output, supports various models like LiteLLM and transformers, and provides features such as defining prompts as python functions, simple tool use, storing models as connection strings, async batching for large scale generation, and modern Python support with type hints and async capabilities. Rigging is developed by dreadnode and is suitable for tasks like building chat pipelines, running completions, tracking behavior with tracing, playing with generation parameters, and scaling up with iterating and batching.

parlant
Parlant is a structured approach to building and guiding customer-facing AI agents. It allows developers to create and manage robust AI agents, providing specific feedback on agent behavior and helping understand user intentions better. With features like guidelines, glossary, coherence checks, dynamic context, and guided tool use, Parlant offers control over agent responses and behavior. Developer-friendly aspects include instant changes, Git integration, clean architecture, and type safety. It enables confident deployment with scalability, effective debugging, and validation before deployment. Parlant works with major LLM providers and offers client SDKs for Python and TypeScript. The tool facilitates natural customer interactions through asynchronous communication and provides a chat UI for testing new behaviors before deployment.

GPTSwarm
GPTSwarm is a graph-based framework for LLM-based agents that enables the creation of LLM-based agents from graphs and facilitates the customized and automatic self-organization of agent swarms with self-improvement capabilities. The library includes components for domain-specific operations, graph-related functions, LLM backend selection, memory management, and optimization algorithms to enhance agent performance and swarm efficiency. Users can quickly run predefined swarms or utilize tools like the file analyzer. GPTSwarm supports local LM inference via LM Studio, allowing users to run with a local LLM model. The framework has been accepted by ICML2024 and offers advanced features for experimentation and customization.

chunkhound
ChunkHound is a modern tool for transforming your codebase into a searchable knowledge base for AI assistants. It utilizes semantic search via the cAST algorithm and regex search, integrating with AI assistants through the Model Context Protocol (MCP). With features like cAST Algorithm, Multi-Hop Semantic Search, Regex search, and support for 22 languages, ChunkHound offers a local-first approach to code analysis and discovery. It provides intelligent code discovery, universal language support, and real-time indexing capabilities, making it a powerful tool for developers looking to enhance their coding experience.
For similar tasks

Flowise
Flowise is a tool that allows users to build customized LLM flows with a drag-and-drop UI. It is open-source and self-hostable, and it supports various deployments, including AWS, Azure, Digital Ocean, GCP, Railway, Render, HuggingFace Spaces, Elestio, Sealos, and RepoCloud. Flowise has three different modules in a single mono repository: server, ui, and components. The server module is a Node backend that serves API logics, the ui module is a React frontend, and the components module contains third-party node integrations. Flowise supports different environment variables to configure your instance, and you can specify these variables in the .env file inside the packages/server folder.

nlux
nlux is an open-source Javascript and React JS library that makes it super simple to integrate powerful large language models (LLMs) like ChatGPT into your web app or website. With just a few lines of code, you can add conversational AI capabilities and interact with your favourite LLM.

generative-ai-go
The Google AI Go SDK enables developers to use Google's state-of-the-art generative AI models (like Gemini) to build AI-powered features and applications. It supports use cases like generating text from text-only input, generating text from text-and-images input (multimodal), building multi-turn conversations (chat), and embedding.

awesome-langchain-zh
The awesome-langchain-zh repository is a collection of resources related to LangChain, a framework for building AI applications using large language models (LLMs). The repository includes sections on the LangChain framework itself, other language ports of LangChain, tools for low-code development, services, agents, templates, platforms, open-source projects related to knowledge management and chatbots, as well as learning resources such as notebooks, videos, and articles. It also covers other LLM frameworks and provides additional resources for exploring and working with LLMs. The repository serves as a comprehensive guide for developers and AI enthusiasts interested in leveraging LangChain and LLMs for various applications.

Large-Language-Model-Notebooks-Course
This practical free hands-on course focuses on Large Language models and their applications, providing a hands-on experience using models from OpenAI and the Hugging Face library. The course is divided into three major sections: Techniques and Libraries, Projects, and Enterprise Solutions. It covers topics such as Chatbots, Code Generation, Vector databases, LangChain, Fine Tuning, PEFT Fine Tuning, Soft Prompt tuning, LoRA, QLoRA, Evaluate Models, Knowledge Distillation, and more. Each section contains chapters with lessons supported by notebooks and articles. The course aims to help users build projects and explore enterprise solutions using Large Language Models.

ai-chatbot
Next.js AI Chatbot is an open-source app template for building AI chatbots using Next.js, Vercel AI SDK, OpenAI, and Vercel KV. It includes features like Next.js App Router, React Server Components, Vercel AI SDK for streaming chat UI, support for various AI models, Tailwind CSS styling, Radix UI for headless components, chat history management, rate limiting, session storage with Vercel KV, and authentication with NextAuth.js. The template allows easy deployment to Vercel and customization of AI model providers.

awesome-local-llms
The 'awesome-local-llms' repository is a curated list of open-source tools for local Large Language Model (LLM) inference, covering both proprietary and open weights LLMs. The repository categorizes these tools into LLM inference backend engines, LLM front end UIs, and all-in-one desktop applications. It collects GitHub repository metrics as proxies for popularity and active maintenance. Contributions are encouraged, and users can suggest additional open-source repositories through the Issues section or by running a provided script to update the README and make a pull request. The repository aims to provide a comprehensive resource for exploring and utilizing local LLM tools.

Awesome-AI-Data-Guided-Projects
A curated list of data science & AI guided projects to start building your portfolio. The repository contains guided projects covering various topics such as large language models, time series analysis, computer vision, natural language processing (NLP), and data science. Each project provides detailed instructions on how to implement specific tasks using different tools and technologies.
For similar jobs

responsible-ai-toolbox
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment interfaces and libraries for understanding AI systems. It empowers developers and stakeholders to develop and monitor AI responsibly, enabling better data-driven actions. The toolbox includes visualization widgets for model assessment, error analysis, interpretability, fairness assessment, and mitigations library. It also offers a JupyterLab extension for managing machine learning experiments and a library for measuring gender bias in NLP datasets.

LLMLingua
LLMLingua is a tool that utilizes a compact, well-trained language model to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models, achieving up to 20x compression with minimal performance loss. The tool includes LLMLingua, LongLLMLingua, and LLMLingua-2, each offering different levels of prompt compression and performance improvements for tasks involving large language models.

llm-examples
Starter examples for building LLM apps with Streamlit. This repository showcases a growing collection of LLM minimum working examples, including a Chatbot, File Q&A, Chat with Internet search, LangChain Quickstart, LangChain PromptTemplate, and Chat with user feedback. Users can easily get their own OpenAI API key and set it as an environment variable in Streamlit apps to run the examples locally.

LMOps
LMOps is a research initiative focusing on fundamental research and technology for building AI products with foundation models, particularly enabling AI capabilities with Large Language Models (LLMs) and Generative AI models. The project explores various aspects such as prompt optimization, longer context handling, LLM alignment, acceleration of LLMs, LLM customization, and understanding in-context learning. It also includes tools like Promptist for automatic prompt optimization, Structured Prompting for efficient long-sequence prompts consumption, and X-Prompt for extensible prompts beyond natural language. Additionally, LLMA accelerators are developed to speed up LLM inference by referencing and copying text spans from documents. The project aims to advance technologies that facilitate prompting language models and enhance the performance of LLMs in various scenarios.

awesome-tool-llm
This repository focuses on exploring tools that enhance the performance of language models for various tasks. It provides a structured list of literature relevant to tool-augmented language models, covering topics such as tool basics, tool use paradigm, scenarios, advanced methods, and evaluation. The repository includes papers, preprints, and books that discuss the use of tools in conjunction with language models for tasks like reasoning, question answering, mathematical calculations, accessing knowledge, interacting with the world, and handling non-textual modalities.

gaianet-node
GaiaNet-node is a tool that allows users to run their own GaiaNet node, enabling them to interact with an AI agent. The tool provides functionalities to install the default node software stack, initialize the node with model files and vector database files, start the node, stop the node, and update configurations. Users can use pre-set configurations or pass a custom URL for initialization. The tool is designed to facilitate communication with the AI agent and access node information via a browser. GaiaNet-node requires sudo privilege for installation but can also be installed without sudo privileges with specific commands.

llmops-duke-aipi
LLMOps Duke AIPI is a course focused on operationalizing Large Language Models, teaching methodologies for developing applications using software development best practices with large language models. The course covers various topics such as generative AI concepts, setting up development environments, interacting with large language models, using local large language models, applied solutions with LLMs, extensibility using plugins and functions, retrieval augmented generation, introduction to Python web frameworks for APIs, DevOps principles, deploying machine learning APIs, LLM platforms, and final presentations. Students will learn to build, share, and present portfolios using Github, YouTube, and Linkedin, as well as develop non-linear life-long learning skills. Prerequisites include basic Linux and programming skills, with coursework available in Python or Rust. Additional resources and references are provided for further learning and exploration.

Awesome-AISourceHub
Awesome-AISourceHub is a repository that collects high-quality information sources in the field of AI technology. It serves as a synchronized source of information to avoid information gaps and information silos. The repository aims to provide valuable resources for individuals such as AI book authors, enterprise decision-makers, and tool developers who frequently use Twitter to share insights and updates related to AI advancements. The platform emphasizes the importance of accessing information closer to the source for better quality content. Users can contribute their own high-quality information sources to the repository by following specific steps outlined in the contribution guidelines. The repository covers various platforms such as Twitter, public accounts, knowledge planets, podcasts, blogs, websites, YouTube channels, and more, offering a comprehensive collection of AI-related resources for individuals interested in staying updated with the latest trends and developments in the AI field.