agentops
Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen
Stars: 1528
AgentOps is a toolkit for evaluating and developing robust and reliable AI agents. It provides benchmarks, observability, and replay analytics to help developers build better agents. AgentOps is open beta and can be signed up for here. Key features of AgentOps include: - Session replays in 3 lines of code: Initialize the AgentOps client and automatically get analytics on every LLM call. - Time travel debugging: (coming soon!) - Agent Arena: (coming soon!) - Callback handlers: AgentOps works seamlessly with applications built using Langchain and LlamaIndex.
README:
๐ฆ Twitter ย ย โขย ย ๐ข Discord ย ย โขย ย ๐๏ธ Dashboard ย ย โขย ย ๐ Documentation
AgentOps helps developers build, evaluate, and monitor AI agents. From prototype to production.
๐ Replay Analytics and Debugging | Step-by-step agent execution graphs |
๐ธ LLM Cost Management | Track spend with LLM foundation model providers |
๐งช Agent Benchmarking | Test your agents against 1,000+ evals |
๐ Compliance and Security | Detect common prompt injection and data exfiltration exploits |
๐ค Framework Integrations | Native Integrations with CrewAI, AutoGen, & LangChain |
pip install agentops
Initialize the AgentOps client and automatically get analytics on all your LLM calls.
import agentops
# Beginning of your program (i.e. main.py, __init__.py)
agentops.init( < INSERT YOUR API KEY HERE >)
...
# End of program
agentops.end_session('Success')
All your sessions can be viewed on the AgentOps dashboard
Add powerful observability to your agents, tools, and functions with as little code as possible: one line at a time.
Refer to our documentation
# Automatically associate all Events with the agent that originated them
from agentops import track_agent
@track_agent(name='SomeCustomName')
class MyAgent:
...
# Automatically create ToolEvents for tools that agents will use
from agentops import record_tool
@record_tool('SampleToolName')
def sample_tool(...):
...
# Automatically create ActionEvents for other functions.
from agentops import record_action
@agentops.record_action('sample function being record')
def sample_function(...):
...
# Manually record any other Events
from agentops import record, ActionEvent
record(ActionEvent("received_user_input"))
Build Crew agents with observability with only 2 lines of code. Simply set an AGENTOPS_API_KEY
in your environment, and your crews will get automatic monitoring on the AgentOps dashboard.
pip install 'crewai[agentops]'
With only two lines of code, add full observability and monitoring to Autogen agents. Set an AGENTOPS_API_KEY
in your environment and call agentops.init()
AgentOps works seamlessly with applications built using Langchain. To use the handler, install Langchain as an optional dependency:
Installation
pip install agentops[langchain]
To use the handler, import and set
import os
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, AgentType
from agentops.partners.langchain_callback_handler import LangchainCallbackHandler
AGENTOPS_API_KEY = os.environ['AGENTOPS_API_KEY']
handler = LangchainCallbackHandler(api_key=AGENTOPS_API_KEY, tags=['Langchain Example'])
llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY,
callbacks=[handler],
model='gpt-3.5-turbo')
agent = initialize_agent(tools,
llm,
agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
callbacks=[handler], # You must pass in a callback handler to record your agent
handle_parsing_errors=True)
Check out the Langchain Examples Notebook for more details including Async handlers.
First class support for Cohere(>=5.4.0). This is a living integration, should you need any added functionality please message us on Discord!
Installation
pip install cohere
import cohere
import agentops
# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init(<INSERT YOUR API KEY HERE>)
co = cohere.Client()
chat = co.chat(
message="Is it pronounced ceaux-hear or co-hehray?"
)
print(chat)
agentops.end_session('Success')
import cohere
import agentops
# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init(<INSERT YOUR API KEY HERE>)
co = cohere.Client()
stream = co.chat_stream(
message="Write me a haiku about the synergies between Cohere and AgentOps"
)
for event in stream:
if event.event_type == "text-generation":
print(event.text, end='')
agentops.end_session('Success')
Track agents built with the Anthropic Python SDK (>=0.32.0).
Installation
pip install anthropic
import anthropic
import agentops
# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init(<INSERT YOUR API KEY HERE>)
client = anthropic.Anthropic(
# This is the default and can be omitted
api_key=os.environ.get("ANTHROPIC_API_KEY"),
)
message = client.messages.create(
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Tell me a cool fact about AgentOps",
}
],
model="claude-3-opus-20240229",
)
print(message.content)
agentops.end_session('Success')
Streaming
import anthropic
import agentops
# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init(<INSERT YOUR API KEY HERE>)
client = anthropic.Anthropic(
# This is the default and can be omitted
api_key=os.environ.get("ANTHROPIC_API_KEY"),
)
stream = client.messages.create(
max_tokens=1024,
model="claude-3-opus-20240229",
messages=[
{
"role": "user",
"content": "Tell me something cool about streaming agents",
}
],
stream=True,
)
response = ""
for event in stream:
if event.type == "content_block_delta":
response += event.delta.text
elif event.type == "message_stop":
print("\n")
print(response)
print("\n")
Async
import asyncio
from anthropic import AsyncAnthropic
client = AsyncAnthropic(
# This is the default and can be omitted
api_key=os.environ.get("ANTHROPIC_API_KEY"),
)
async def main() -> None:
message = await client.messages.create(
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Tell me something interesting about async agents",
}
],
model="claude-3-opus-20240229",
)
print(message.content)
await main()
AgentOps provides support for LiteLLM(>=1.3.1), allowing you to call 100+ LLMs using the same Input/Output Format.
Installation
pip install litellm
# Do not use LiteLLM like this
# from litellm import completion
# ...
# response = completion(model="claude-3", messages=messages)
# Use LiteLLM like this
import litellm
...
response = litellm.completion(model="claude-3", messages=messages)
# or
response = await litellm.acompletion(model="claude-3", messages=messages)
AgentOps works seamlessly with applications built using LlamaIndex, a framework for building context-augmented generative AI applications with LLMs.
Installation
pip install llama-index-instrumentation-agentops
To use the handler, import and set
from llama_index.core import set_global_handler
# NOTE: Feel free to set your AgentOps environment variables (e.g., 'AGENTOPS_API_KEY')
# as outlined in the AgentOps documentation, or pass the equivalent keyword arguments
# anticipated by AgentOps' AOClient as **eval_params in set_global_handler.
set_global_handler("agentops")
Check out the LlamaIndex docs for more details.
(coming soon!)
Platform | Dashboard | Evals |
---|---|---|
โ Python SDK | โ Multi-session and Cross-session metrics | โ Custom eval metrics |
๐ง Evaluation builder API | โ Custom event tag trackingย | ๐ Agent scorecards |
โ Javascript/Typescript SDK | โ Session replays | ๐ Evaluation playground + leaderboard |
Performance testing | Environments | LLM Testing | Reasoning and execution testing |
---|---|---|---|
โ Event latency analysis | ๐ Non-stationary environment testing | ๐ LLM non-deterministic function detection | ๐ง Infinite loops and recursive thought detection |
โ Agent workflow execution pricing | ๐ Multi-modal environments | ๐ง Token limit overflow flags | ๐ Faulty reasoning detection |
๐ง Success validators (external) | ๐ Execution containers | ๐ Context limit overflow flags | ๐ Generative code validators |
๐ Agent controllers/skill tests | โ Honeypot and prompt injection detection (PromptArmor) | ๐ API bill tracking | ๐ Error breakpoint analysis |
๐ Information context constraint testing | ๐ Anti-agent roadblocks (i.e. Captchas) | ๐ CI/CD integration checks | |
๐ Regression testing | ๐ Multi-agent framework visualization |
Without the right tools, AI agents are slow, expensive, and unreliable. Our mission is to bring your agent from prototype to production. Here's why AgentOps stands out:
- Comprehensive Observability: Track your AI agents' performance, user interactions, and API usage.
- Real-Time Monitoring: Get instant insights with session replays, metrics, and live monitoring tools.
- Cost Control: Monitor and manage your spend on LLM and API calls.
- Failure Detection: Quickly identify and respond to agent failures and multi-agent interaction issues.
- Tool Usage Statistics: Understand how your agents utilize external tools with detailed analytics.
- Session-Wide Metrics: Gain a holistic view of your agents' sessions with comprehensive statistics.
AgentOps is designed to make agent observability, testing, and monitoring easy.
Check out our growth in the community:
Repository | Stars |
---|---|
ย geekan / MetaGPT | 42787 |
ย run-llama / llama_index | 34446 |
ย crewAIInc / crewAI | 18287 |
ย camel-ai / camel | 5166 |
ย superagent-ai / superagent | 5050 |
ย iyaja / llama-fs | 4713 |
ย BasedHardware / Omi | 2723 |
ย MervinPraison / PraisonAI | 2007 |
ย AgentOps-AI / Jaiqu | 272 |
ย strnad / CrewAI-Studio | 134 |
ย alejandro-ao / exa-crewai | 55 |
ย tonykipkemboi / youtube_yapper_trapper | 47 |
ย sethcoast / cover-letter-builder | 27 |
ย bhancockio / chatgpt4o-analysis | 19 |
ย breakstring / Agentic_Story_Book_Workflow | 14 |
ย MULTI-ON / multion-python | 13 |
Generated using github-dependents-info, by Nicolas Vuillamy
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for agentops
Similar Open Source Tools
agentops
AgentOps is a toolkit for evaluating and developing robust and reliable AI agents. It provides benchmarks, observability, and replay analytics to help developers build better agents. AgentOps is open beta and can be signed up for here. Key features of AgentOps include: - Session replays in 3 lines of code: Initialize the AgentOps client and automatically get analytics on every LLM call. - Time travel debugging: (coming soon!) - Agent Arena: (coming soon!) - Callback handlers: AgentOps works seamlessly with applications built using Langchain and LlamaIndex.
KwaiAgents
KwaiAgents is a series of Agent-related works open-sourced by the [KwaiKEG](https://github.com/KwaiKEG) from [Kuaishou Technology](https://www.kuaishou.com/en). The open-sourced content includes: 1. **KAgentSys-Lite**: a lite version of the KAgentSys in the paper. While retaining some of the original system's functionality, KAgentSys-Lite has certain differences and limitations when compared to its full-featured counterpart, such as: (1) a more limited set of tools; (2) a lack of memory mechanisms; (3) slightly reduced performance capabilities; and (4) a different codebase, as it evolves from open-source projects like BabyAGI and Auto-GPT. Despite these modifications, KAgentSys-Lite still delivers comparable performance among numerous open-source Agent systems available. 2. **KAgentLMs**: a series of large language models with agent capabilities such as planning, reflection, and tool-use, acquired through the Meta-agent tuning proposed in the paper. 3. **KAgentInstruct**: over 200k Agent-related instructions finetuning data (partially human-edited) proposed in the paper. 4. **KAgentBench**: over 3,000 human-edited, automated evaluation data for testing Agent capabilities, with evaluation dimensions including planning, tool-use, reflection, concluding, and profiling.
FalkorDB
FalkorDB is the first queryable Property Graph database to use sparse matrices to represent the adjacency matrix in graphs and linear algebra to query the graph. Primary features: * Adopting the Property Graph Model * Nodes (vertices) and Relationships (edges) that may have attributes * Nodes can have multiple labels * Relationships have a relationship type * Graphs represented as sparse adjacency matrices * OpenCypher with proprietary extensions as a query language * Queries are translated into linear algebra expressions
litgpt
LitGPT is a command-line tool designed to easily finetune, pretrain, evaluate, and deploy 20+ LLMs **on your own data**. It features highly-optimized training recipes for the world's most powerful open-source large-language-models (LLMs).
rust-genai
genai is a multi-AI providers library for Rust that aims to provide a common and ergonomic single API to various generative AI providers such as OpenAI, Anthropic, Cohere, Ollama, and Gemini. It focuses on standardizing chat completion APIs across major AI services, prioritizing ergonomics and commonality. The library initially focuses on text chat APIs and plans to expand to support images, function calling, and more in the future versions. Version 0.1.x will have breaking changes in patches, while version 0.2.x will follow semver more strictly. genai does not provide a full representation of a given AI provider but aims to simplify the differences at a lower layer for ease of use.
airswap-protocols
AirSwap Protocols is a repository containing smart contracts for developers and traders on the AirSwap peer-to-peer trading network. It includes various packages for functionalities like server registry, atomic token swap, staking, rewards pool, batch token and order calls, libraries, and utils. The repository follows a branching and release process for contracts and tools, with steps for regular development process and individual package features or patches. Users can deploy and verify contracts using specific commands with network flags.
ScaleLLM
ScaleLLM is a cutting-edge inference system engineered for large language models (LLMs), meticulously designed to meet the demands of production environments. It extends its support to a wide range of popular open-source models, including Llama3, Gemma, Bloom, GPT-NeoX, and more. ScaleLLM is currently undergoing active development. We are fully committed to consistently enhancing its efficiency while also incorporating additional features. Feel free to explore our **_Roadmap_** for more details. ## Key Features * High Efficiency: Excels in high-performance LLM inference, leveraging state-of-the-art techniques and technologies like Flash Attention, Paged Attention, Continuous batching, and more. * Tensor Parallelism: Utilizes tensor parallelism for efficient model execution. * OpenAI-compatible API: An efficient golang rest api server that compatible with OpenAI. * Huggingface models: Seamless integration with most popular HF models, supporting safetensors. * Customizable: Offers flexibility for customization to meet your specific needs, and provides an easy way to add new models. * Production Ready: Engineered with production environments in mind, ScaleLLM is equipped with robust system monitoring and management features to ensure a seamless deployment experience.
langcheck
LangCheck is a Python library that provides a suite of metrics and tools for evaluating the quality of text generated by large language models (LLMs). It includes metrics for evaluating text fluency, sentiment, toxicity, factual consistency, and more. LangCheck also provides tools for visualizing metrics, augmenting data, and writing unit tests for LLM applications. With LangCheck, you can quickly and easily assess the quality of LLM-generated text and identify areas for improvement.
MooER
MooER (ๆฉ่ณ) is an LLM-based speech recognition and translation model developed by Moore Threads. It allows users to transcribe speech into text (ASR) and translate speech into other languages (AST) in an end-to-end manner. The model was trained using 5K hours of data and is now also available with an 80K hours version. MooER is the first LLM-based speech model trained and inferred using domestic GPUs. The repository includes pretrained models, inference code, and a Gradio demo for a better user experience.
AiOS
AiOS is a tool for human pose and shape estimation, performing human localization and SMPL-X estimation in a progressive manner. It consists of body localization, body refinement, and whole-body refinement stages. Users can download datasets for evaluation, SMPL-X body models, and AiOS checkpoint. Installation involves creating a conda virtual environment, installing PyTorch, torchvision, Pytorch3D, MMCV, and other dependencies. Inference requires placing the video for inference and pretrained models in specific directories. Test results are provided for NMVE, NMJE, MVE, and MPJPE on datasets like BEDLAM and AGORA. Users can run scripts for AGORA validation, AGORA test leaderboard, and BEDLAM leaderboard. The tool acknowledges codes from MMHuman3D, ED-Pose, and SMPLer-X.
polaris
Polaris establishes a novel, industryโcertified standard to foster the development of impactful methods in AI-based drug discovery. This library is a Python client to interact with the Polaris Hub. It allows you to download Polaris datasets and benchmarks, evaluate a custom method against a Polaris benchmark, and create and upload new datasets and benchmarks.
imodelsX
imodelsX is a Scikit-learn friendly library that provides tools for explaining, predicting, and steering text models/data. It also includes a collection of utilities for getting started with text data. **Explainable modeling/steering** | Model | Reference | Output | Description | |---|---|---|---| | Tree-Prompt | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/tree_prompt) | Explanation + Steering | Generates a tree of prompts to steer an LLM (_Official_) | | iPrompt | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/iprompt) | Explanation + Steering | Generates a prompt that explains patterns in data (_Official_) | | AutoPrompt | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/autoprompt) | Explanation + Steering | Find a natural-language prompt using input-gradients (โ In progress)| | D3 | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/d3) | Explanation | Explain the difference between two distributions | | SASC | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/sasc) | Explanation | Explain a black-box text module using an LLM (_Official_) | | Aug-Linear | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/aug_linear) | Linear model | Fit better linear model using an LLM to extract embeddings (_Official_) | | Aug-Tree | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/aug_tree) | Decision tree | Fit better decision tree using an LLM to expand features (_Official_) | **General utilities** | Model | Reference | |---|---| | LLM wrapper| [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/llm) | Easily call different LLMs | | | Dataset wrapper| [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/data) | Download minimially processed huggingface datasets | | | Bag of Ngrams | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/bag_of_ngrams) | Learn a linear model of ngrams | | | Linear Finetune | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/linear_finetune) | Finetune a single linear layer on top of LLM embeddings | | **Related work** * [imodels package](https://github.com/microsoft/interpretml/tree/main/imodels) (JOSS 2021) - interpretable ML package for concise, transparent, and accurate predictive modeling (sklearn-compatible). * [Adaptive wavelet distillation](https://arxiv.org/abs/2111.06185) (NeurIPS 2021) - distilling a neural network into a concise wavelet model * [Transformation importance](https://arxiv.org/abs/1912.04938) (ICLR 2020 workshop) - using simple reparameterizations, allows for calculating disentangled importances to transformations of the input (e.g. assigning importances to different frequencies) * [Hierarchical interpretations](https://arxiv.org/abs/1807.03343) (ICLR 2019) - extends CD to CNNs / arbitrary DNNs, and aggregates explanations into a hierarchy * [Interpretation regularization](https://arxiv.org/abs/2006.14340) (ICML 2020) - penalizes CD / ACD scores during training to make models generalize better * [PDR interpretability framework](https://www.pnas.org/doi/10.1073/pnas.1814225116) (PNAS 2019) - an overarching framewwork for guiding and framing interpretable machine learning
freeGPT
freeGPT provides free access to text and image generation models. It supports various models, including gpt3, gpt4, alpaca_7b, falcon_40b, prodia, and pollinations. The tool offers both asynchronous and non-asynchronous interfaces for text completion and image generation. It also features an interactive Discord bot that provides access to all the models in the repository. The tool is easy to use and can be integrated into various applications.
TempCompass
TempCompass is a benchmark designed to evaluate the temporal perception ability of Video LLMs. It encompasses a diverse set of temporal aspects and task formats to comprehensively assess the capability of Video LLMs in understanding videos. The benchmark includes conflicting videos to prevent models from relying on single-frame bias and language priors. Users can clone the repository, install required packages, prepare data, run inference using examples like Video-LLaVA and Gemini, and evaluate the performance of their models across different tasks such as Multi-Choice QA, Yes/No QA, Caption Matching, and Caption Generation.
ExplainableAI.jl
ExplainableAI.jl is a Julia package that implements interpretability methods for black-box classifiers, focusing on local explanations and attribution maps in input space. The package requires models to be differentiable with Zygote.jl. It is similar to Captum and Zennit for PyTorch and iNNvestigate for Keras models. Users can analyze and visualize explanations for model predictions, with support for different XAI methods and customization. The package aims to provide transparency and insights into model decision-making processes, making it a valuable tool for understanding and validating machine learning models.
Noi
Noi is an AI-enhanced customizable browser designed to streamline digital experiences. It includes curated AI websites, allows adding any URL, offers prompts management, Noi Ask for batch messaging, various themes, Noi Cache Mode for quick link access, cookie data isolation, and more. Users can explore, extend, and empower their browsing experience with Noi.
For similar tasks
agentops
AgentOps is a toolkit for evaluating and developing robust and reliable AI agents. It provides benchmarks, observability, and replay analytics to help developers build better agents. AgentOps is open beta and can be signed up for here. Key features of AgentOps include: - Session replays in 3 lines of code: Initialize the AgentOps client and automatically get analytics on every LLM call. - Time travel debugging: (coming soon!) - Agent Arena: (coming soon!) - Callback handlers: AgentOps works seamlessly with applications built using Langchain and LlamaIndex.
ai2apps
AI2Apps is a visual IDE for building LLM-based AI agent applications, enabling developers to efficiently create AI agents through drag-and-drop, with features like design-to-development for rapid prototyping, direct packaging of agents into apps, powerful debugging capabilities, enhanced user interaction, efficient team collaboration, flexible deployment, multilingual support, simplified product maintenance, and extensibility through plugins.
vircadia-native-core
Vircadiaโข is an open source agent-based metaverse ecosystem that excels in mass human and agent (AI) based immersive worlds. It offers mobile, desktop, and VR support through the web, allows hundreds of agents simultaneously, supports full-body (human or agents), scripting with JavaScript & TypeScript, visual scripting, full world editor, 4096kmยณ world space in a server, fully self-hosted, and more. Vircadia is sponsored by various companies, organizations, and governments. An 'agent' in Vircadia is an AI being that shares the same space as users, interacting, speaking, and experiencing the world, used for companionship, training, and gameplay opportunities. Vircadia excels at deploying agents en-masse for a full sandbox experience.
mo-ai-studio
Mo AI Studio is an enterprise-level AI agent running platform that enables the operation of customized intelligent AI agents with system-level capabilities. It supports various IDEs and programming languages, allows modification of multiple files with reasoning, cross-project context modifications, customizable agents, system-level file operations, document writing, question answering, knowledge sharing, and flexible output processors. The platform also offers various setters and a custom component publishing feature. Mo AI Studio is a fusion of artificial intelligence and human creativity, designed to bring unprecedented efficiency and innovation to enterprises.
Nothotdog
NotHotDog is an open-source testing framework for evaluating and validating voice and text-based AI agents. It offers a user-friendly interface for creating, managing, and executing tests against AI models. The framework supports WebSocket and REST API, test case management, automated evaluation of responses, and provides a seamless experience for test creation and execution.
AutoGPT
AutoGPT is a revolutionary tool that empowers everyone to harness the power of AI. With AutoGPT, you can effortlessly build, test, and delegate tasks to AI agents, unlocking a world of possibilities. Our mission is to provide the tools you need to focus on what truly matters: innovation and creativity.
Arcade-Learning-Environment
The Arcade Learning Environment (ALE) is a simple framework that allows researchers and hobbyists to develop AI agents for Atari 2600 games. It is built on top of the Atari 2600 emulator Stella and separates the details of emulation from agent design. The ALE currently supports three different interfaces: C++, Python, and OpenAI Gym.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.