agentops
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Langchain, Autogen, AG2, and CamelAI
Stars: 2636
AgentOps is a toolkit for evaluating and developing robust and reliable AI agents. It provides benchmarks, observability, and replay analytics to help developers build better agents. AgentOps is open beta and can be signed up for here. Key features of AgentOps include: - Session replays in 3 lines of code: Initialize the AgentOps client and automatically get analytics on every LLM call. - Time travel debugging: (coming soon!) - Agent Arena: (coming soon!) - Callback handlers: AgentOps works seamlessly with applications built using Langchain and LlamaIndex.
README:
AgentOps helps developers build, evaluate, and monitor AI agents. From prototype to production.
📊 Replay Analytics and Debugging | Step-by-step agent execution graphs |
💸 LLM Cost Management | Track spend with LLM foundation model providers |
🧪 Agent Benchmarking | Test your agents against 1,000+ evals |
🔐 Compliance and Security | Detect common prompt injection and data exfiltration exploits |
🤝 Framework Integrations | Native Integrations with CrewAI, AG2(AutoGen), Camel AI, & LangChain |
pip install agentops
Initialize the AgentOps client and automatically get analytics on all your LLM calls.
import agentops
# Beginning of your program (i.e. main.py, __init__.py)
agentops.init( < INSERT YOUR API KEY HERE >)
...
# End of program
agentops.end_session('Success')
All your sessions can be viewed on the AgentOps dashboard
Add powerful observability to your agents, tools, and functions with as little code as possible: one line at a time.
Refer to our documentation
# Automatically associate all Events with the agent that originated them
from agentops import track_agent
@track_agent(name='SomeCustomName')
class MyAgent:
...
# Automatically create ToolEvents for tools that agents will use
from agentops import record_tool
@record_tool('SampleToolName')
def sample_tool(...):
...
# Automatically create ActionEvents for other functions.
from agentops import record_action
@agentops.record_action('sample function being record')
def sample_function(...):
...
# Manually record any other Events
from agentops import record, ActionEvent
record(ActionEvent("received_user_input"))
Build Crew agents with observability with only 2 lines of code. Simply set an AGENTOPS_API_KEY
in your environment, and your crews will get automatic monitoring on the AgentOps dashboard.
pip install 'crewai[agentops]'
With only two lines of code, add full observability and monitoring to AG2 (formerly AutoGen) agents. Set an AGENTOPS_API_KEY
in your environment and call agentops.init()
Track and analyze CAMEL agents with full observability. Set an AGENTOPS_API_KEY
in your environment and initialize AgentOps to get started.
- Camel AI - Advanced agent communication framework
- AgentOps integration example
- Official Camel AI documentation
Installation
pip install "camel-ai[all]==0.2.11"
pip install agentops
import os
import agentops
from camel.agents import ChatAgent
from camel.messages import BaseMessage
from camel.models import ModelFactory
from camel.types import ModelPlatformType, ModelType
# Initialize AgentOps
agentops.init(os.getenv("AGENTOPS_API_KEY"), default_tags=["CAMEL Example"])
# Import toolkits after AgentOps init for tracking
from camel.toolkits import SearchToolkit
# Set up the agent with search tools
sys_msg = BaseMessage.make_assistant_message(
role_name='Tools calling operator',
content='You are a helpful assistant'
)
# Configure tools and model
tools = [*SearchToolkit().get_tools()]
model = ModelFactory.create(
model_platform=ModelPlatformType.OPENAI,
model_type=ModelType.GPT_4O_MINI,
)
# Create and run the agent
camel_agent = ChatAgent(
system_message=sys_msg,
model=model,
tools=tools,
)
response = camel_agent.step("What is AgentOps?")
print(response)
agentops.end_session("Success")
Check out our Camel integration guide for more examples including multi-agent scenarios.
AgentOps works seamlessly with applications built using Langchain. To use the handler, install Langchain as an optional dependency:
Installation
pip install agentops[langchain]
To use the handler, import and set
import os
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, AgentType
from agentops.partners.langchain_callback_handler import LangchainCallbackHandler
AGENTOPS_API_KEY = os.environ['AGENTOPS_API_KEY']
handler = LangchainCallbackHandler(api_key=AGENTOPS_API_KEY, tags=['Langchain Example'])
llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY,
callbacks=[handler],
model='gpt-3.5-turbo')
agent = initialize_agent(tools,
llm,
agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
callbacks=[handler], # You must pass in a callback handler to record your agent
handle_parsing_errors=True)
Check out the Langchain Examples Notebook for more details including Async handlers.
First class support for Cohere(>=5.4.0). This is a living integration, should you need any added functionality please message us on Discord!
Installation
pip install cohere
import cohere
import agentops
# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init(<INSERT YOUR API KEY HERE>)
co = cohere.Client()
chat = co.chat(
message="Is it pronounced ceaux-hear or co-hehray?"
)
print(chat)
agentops.end_session('Success')
import cohere
import agentops
# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init(<INSERT YOUR API KEY HERE>)
co = cohere.Client()
stream = co.chat_stream(
message="Write me a haiku about the synergies between Cohere and AgentOps"
)
for event in stream:
if event.event_type == "text-generation":
print(event.text, end='')
agentops.end_session('Success')
Track agents built with the Anthropic Python SDK (>=0.32.0).
Installation
pip install anthropic
import anthropic
import agentops
# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init(<INSERT YOUR API KEY HERE>)
client = anthropic.Anthropic(
# This is the default and can be omitted
api_key=os.environ.get("ANTHROPIC_API_KEY"),
)
message = client.messages.create(
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Tell me a cool fact about AgentOps",
}
],
model="claude-3-opus-20240229",
)
print(message.content)
agentops.end_session('Success')
Streaming
import anthropic
import agentops
# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init(<INSERT YOUR API KEY HERE>)
client = anthropic.Anthropic(
# This is the default and can be omitted
api_key=os.environ.get("ANTHROPIC_API_KEY"),
)
stream = client.messages.create(
max_tokens=1024,
model="claude-3-opus-20240229",
messages=[
{
"role": "user",
"content": "Tell me something cool about streaming agents",
}
],
stream=True,
)
response = ""
for event in stream:
if event.type == "content_block_delta":
response += event.delta.text
elif event.type == "message_stop":
print("\n")
print(response)
print("\n")
Async
import asyncio
from anthropic import AsyncAnthropic
client = AsyncAnthropic(
# This is the default and can be omitted
api_key=os.environ.get("ANTHROPIC_API_KEY"),
)
async def main() -> None:
message = await client.messages.create(
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Tell me something interesting about async agents",
}
],
model="claude-3-opus-20240229",
)
print(message.content)
await main()
Track agents built with the Anthropic Python SDK (>=0.32.0).
Installation
pip install mistralai
Sync
from mistralai import Mistral
import agentops
# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init(<INSERT YOUR API KEY HERE>)
client = Mistral(
# This is the default and can be omitted
api_key=os.environ.get("MISTRAL_API_KEY"),
)
message = client.chat.complete(
messages=[
{
"role": "user",
"content": "Tell me a cool fact about AgentOps",
}
],
model="open-mistral-nemo",
)
print(message.choices[0].message.content)
agentops.end_session('Success')
Streaming
from mistralai import Mistral
import agentops
# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init(<INSERT YOUR API KEY HERE>)
client = Mistral(
# This is the default and can be omitted
api_key=os.environ.get("MISTRAL_API_KEY"),
)
message = client.chat.stream(
messages=[
{
"role": "user",
"content": "Tell me something cool about streaming agents",
}
],
model="open-mistral-nemo",
)
response = ""
for event in message:
if event.data.choices[0].finish_reason == "stop":
print("\n")
print(response)
print("\n")
else:
response += event.text
agentops.end_session('Success')
Async
import asyncio
from mistralai import Mistral
client = Mistral(
# This is the default and can be omitted
api_key=os.environ.get("MISTRAL_API_KEY"),
)
async def main() -> None:
message = await client.chat.complete_async(
messages=[
{
"role": "user",
"content": "Tell me something interesting about async agents",
}
],
model="open-mistral-nemo",
)
print(message.choices[0].message.content)
await main()
Async Streaming
import asyncio
from mistralai import Mistral
client = Mistral(
# This is the default and can be omitted
api_key=os.environ.get("MISTRAL_API_KEY"),
)
async def main() -> None:
message = await client.chat.stream_async(
messages=[
{
"role": "user",
"content": "Tell me something interesting about async streaming agents",
}
],
model="open-mistral-nemo",
)
response = ""
async for event in message:
if event.data.choices[0].finish_reason == "stop":
print("\n")
print(response)
print("\n")
else:
response += event.text
await main()
Track agents built with the CamelAI Python SDK (>=0.32.0).
Installation
pip install camel-ai[all]
pip install agentops
#Import Dependencies
import agentops
import os
from getpass import getpass
from dotenv import load_dotenv
#Set Keys
load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY") or "<your openai key here>"
agentops_api_key = os.getenv("AGENTOPS_API_KEY") or "<your agentops key here>"
You can find usage examples here!.
AgentOps provides support for LiteLLM(>=1.3.1), allowing you to call 100+ LLMs using the same Input/Output Format.
Installation
pip install litellm
# Do not use LiteLLM like this
# from litellm import completion
# ...
# response = completion(model="claude-3", messages=messages)
# Use LiteLLM like this
import litellm
...
response = litellm.completion(model="claude-3", messages=messages)
# or
response = await litellm.acompletion(model="claude-3", messages=messages)
AgentOps works seamlessly with applications built using LlamaIndex, a framework for building context-augmented generative AI applications with LLMs.
Installation
pip install llama-index-instrumentation-agentops
To use the handler, import and set
from llama_index.core import set_global_handler
# NOTE: Feel free to set your AgentOps environment variables (e.g., 'AGENTOPS_API_KEY')
# as outlined in the AgentOps documentation, or pass the equivalent keyword arguments
# anticipated by AgentOps' AOClient as **eval_params in set_global_handler.
set_global_handler("agentops")
Check out the LlamaIndex docs for more details.
AgentOps provides support for Llama Stack Python Client(>=0.0.53), allowing you to monitor your Agentic applications.
Track and analyze SwarmZero agents with full observability. Set an AGENTOPS_API_KEY
in your environment and initialize AgentOps to get started.
- SwarmZero - Advanced multi-agent framework
- AgentOps integration example
- SwarmZero AI integration example
- SwarmZero AI - AgentOps documentation
- Official SwarmZero Python SDK
Installation
pip install swarmzero
pip install agentops
from dotenv import load_dotenv
load_dotenv()
import agentops
agentops.init(<INSERT YOUR API KEY HERE>)
from swarmzero import Agent, Swarm
# ...
(coming soon!)
Platform | Dashboard | Evals |
---|---|---|
✅ Python SDK | ✅ Multi-session and Cross-session metrics | ✅ Custom eval metrics |
🚧 Evaluation builder API | ✅ Custom event tag tracking | 🔜 Agent scorecards |
✅ Javascript/Typescript SDK | ✅ Session replays | 🔜 Evaluation playground + leaderboard |
Performance testing | Environments | LLM Testing | Reasoning and execution testing |
---|---|---|---|
✅ Event latency analysis | 🔜 Non-stationary environment testing | 🔜 LLM non-deterministic function detection | 🚧 Infinite loops and recursive thought detection |
✅ Agent workflow execution pricing | 🔜 Multi-modal environments | 🚧 Token limit overflow flags | 🔜 Faulty reasoning detection |
🚧 Success validators (external) | 🔜 Execution containers | 🔜 Context limit overflow flags | 🔜 Generative code validators |
🔜 Agent controllers/skill tests | ✅ Honeypot and prompt injection detection (PromptArmor) | 🔜 API bill tracking | 🔜 Error breakpoint analysis |
🔜 Information context constraint testing | 🔜 Anti-agent roadblocks (i.e. Captchas) | 🔜 CI/CD integration checks | |
🔜 Regression testing | 🔜 Multi-agent framework visualization |
Without the right tools, AI agents are slow, expensive, and unreliable. Our mission is to bring your agent from prototype to production. Here's why AgentOps stands out:
- Comprehensive Observability: Track your AI agents' performance, user interactions, and API usage.
- Real-Time Monitoring: Get instant insights with session replays, metrics, and live monitoring tools.
- Cost Control: Monitor and manage your spend on LLM and API calls.
- Failure Detection: Quickly identify and respond to agent failures and multi-agent interaction issues.
- Tool Usage Statistics: Understand how your agents utilize external tools with detailed analytics.
- Session-Wide Metrics: Gain a holistic view of your agents' sessions with comprehensive statistics.
AgentOps is designed to make agent observability, testing, and monitoring easy.
Check out our growth in the community:
Repository | Stars |
---|---|
geekan / MetaGPT | 42787 |
run-llama / llama_index | 34446 |
crewAIInc / crewAI | 18287 |
camel-ai / camel | 5166 |
superagent-ai / superagent | 5050 |
iyaja / llama-fs | 4713 |
BasedHardware / Omi | 2723 |
MervinPraison / PraisonAI | 2007 |
AgentOps-AI / Jaiqu | 272 |
swarmzero / swarmzero | 195 |
strnad / CrewAI-Studio | 134 |
alejandro-ao / exa-crewai | 55 |
tonykipkemboi / youtube_yapper_trapper | 47 |
sethcoast / cover-letter-builder | 27 |
bhancockio / chatgpt4o-analysis | 19 |
breakstring / Agentic_Story_Book_Workflow | 14 |
MULTI-ON / multion-python | 13 |
Generated using github-dependents-info, by Nicolas Vuillamy
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for agentops
Similar Open Source Tools
agentops
AgentOps is a toolkit for evaluating and developing robust and reliable AI agents. It provides benchmarks, observability, and replay analytics to help developers build better agents. AgentOps is open beta and can be signed up for here. Key features of AgentOps include: - Session replays in 3 lines of code: Initialize the AgentOps client and automatically get analytics on every LLM call. - Time travel debugging: (coming soon!) - Agent Arena: (coming soon!) - Callback handlers: AgentOps works seamlessly with applications built using Langchain and LlamaIndex.
rust-genai
genai is a multi-AI providers library for Rust that aims to provide a common and ergonomic single API to various generative AI providers such as OpenAI, Anthropic, Cohere, Ollama, and Gemini. It focuses on standardizing chat completion APIs across major AI services, prioritizing ergonomics and commonality. The library initially focuses on text chat APIs and plans to expand to support images, function calling, and more in the future versions. Version 0.1.x will have breaking changes in patches, while version 0.2.x will follow semver more strictly. genai does not provide a full representation of a given AI provider but aims to simplify the differences at a lower layer for ease of use.
llama_ros
This repository provides a set of ROS 2 packages to integrate llama.cpp into ROS 2. By using the llama_ros packages, you can easily incorporate the powerful optimization capabilities of llama.cpp into your ROS 2 projects by running GGUF-based LLMs and VLMs.
langcheck
LangCheck is a Python library that provides a suite of metrics and tools for evaluating the quality of text generated by large language models (LLMs). It includes metrics for evaluating text fluency, sentiment, toxicity, factual consistency, and more. LangCheck also provides tools for visualizing metrics, augmenting data, and writing unit tests for LLM applications. With LangCheck, you can quickly and easily assess the quality of LLM-generated text and identify areas for improvement.
freeGPT
freeGPT provides free access to text and image generation models. It supports various models, including gpt3, gpt4, alpaca_7b, falcon_40b, prodia, and pollinations. The tool offers both asynchronous and non-asynchronous interfaces for text completion and image generation. It also features an interactive Discord bot that provides access to all the models in the repository. The tool is easy to use and can be integrated into various applications.
ScaleLLM
ScaleLLM is a cutting-edge inference system engineered for large language models (LLMs), meticulously designed to meet the demands of production environments. It extends its support to a wide range of popular open-source models, including Llama3, Gemma, Bloom, GPT-NeoX, and more. ScaleLLM is currently undergoing active development. We are fully committed to consistently enhancing its efficiency while also incorporating additional features. Feel free to explore our **_Roadmap_** for more details. ## Key Features * High Efficiency: Excels in high-performance LLM inference, leveraging state-of-the-art techniques and technologies like Flash Attention, Paged Attention, Continuous batching, and more. * Tensor Parallelism: Utilizes tensor parallelism for efficient model execution. * OpenAI-compatible API: An efficient golang rest api server that compatible with OpenAI. * Huggingface models: Seamless integration with most popular HF models, supporting safetensors. * Customizable: Offers flexibility for customization to meet your specific needs, and provides an easy way to add new models. * Production Ready: Engineered with production environments in mind, ScaleLLM is equipped with robust system monitoring and management features to ensure a seamless deployment experience.
openlrc
Open-Lyrics is a Python library that transcribes voice files using faster-whisper and translates/polishes the resulting text into `.lrc` files in the desired language using LLM, e.g. OpenAI-GPT, Anthropic-Claude. It offers well preprocessed audio to reduce hallucination and context-aware translation to improve translation quality. Users can install the library from PyPI or GitHub and follow the installation steps to set up the environment. The tool supports GUI usage and provides Python code examples for transcription and translation tasks. It also includes features like utilizing context and glossary for translation enhancement, pricing information for different models, and a list of todo tasks for future improvements.
vnc-lm
vnc-lm is a Discord bot designed for messaging with language models. Users can configure model parameters, branch conversations, and edit prompts to enhance responses. The bot supports various providers like OpenAI, Huggingface, and Cloudflare Workers AI. It integrates with ollama and LiteLLM, allowing users to access a wide range of language model APIs through a single interface. Users can manage models, switch between models, split long messages, and create conversation branches. LiteLLM integration enables support for OpenAI-compatible APIs and local LLM services. The bot requires Docker for installation and can be configured through environment variables. Troubleshooting tips are provided for common issues like context window problems, Discord API errors, and LiteLLM issues.
yomitoku
YomiToku is a Japanese-focused AI document image analysis engine that provides full-text OCR and layout analysis capabilities for images. It recognizes, extracts, and converts text information and figures in images. It includes 4 AI models trained on Japanese datasets for tasks such as detecting text positions, recognizing text strings, analyzing layouts, and recognizing table structures. The models are specialized for Japanese document images, supporting recognition of over 7000 Japanese characters and analyzing layout structures specific to Japanese documents. It offers features like layout analysis, table structure analysis, and reading order estimation to extract information from document images without disrupting their semantic structure. YomiToku supports various output formats such as HTML, markdown, JSON, and CSV, and can also extract figures, tables, and images from documents. It operates efficiently in GPU environments, enabling fast and effective analysis of document transcriptions without requiring high-end GPUs.
cellseg_models.pytorch
cellseg-models.pytorch is a Python library built upon PyTorch for 2D cell/nuclei instance segmentation models. It provides multi-task encoder-decoder architectures and post-processing methods for segmenting cell/nuclei instances. The library offers high-level API to define segmentation models, open-source datasets for training, flexibility to modify model components, sliding window inference, multi-GPU inference, benchmarking utilities, regularization techniques, and example notebooks for training and finetuning models with different backbones.
evalplus
EvalPlus is a rigorous evaluation framework for LLM4Code, providing HumanEval+ and MBPP+ tests to evaluate large language models on code generation tasks. It offers precise evaluation and ranking, coding rigorousness analysis, and pre-generated code samples. Users can use EvalPlus to generate code solutions, post-process code, and evaluate code quality. The tool includes tools for code generation and test input generation using various backends.
mediapipe-rs
MediaPipe-rs is a Rust library designed for MediaPipe tasks on WasmEdge WASI-NN. It offers easy-to-use low-code APIs similar to mediapipe-python, with low overhead and flexibility for custom media input. The library supports various tasks like object detection, image classification, gesture recognition, and more, including TfLite models, TF Hub models, and custom models. Users can create task instances, run sessions for pre-processing, inference, and post-processing, and speed up processing by reusing sessions. The library also provides support for audio tasks using audio data from symphonia, ffmpeg, or raw audio. Users can choose between CPU, GPU, or TPU devices for processing.
educhain
Educhain is a powerful Python package that leverages Generative AI to create engaging and personalized educational content. It enables users to generate multiple-choice questions, create lesson plans, and support various LLM models. Users can export questions to JSON, PDF, and CSV formats, customize prompt templates, and generate questions from text, PDF, URL files, youtube videos, and images. Educhain outperforms traditional methods in content generation speed and quality. It offers advanced configuration options and has a roadmap for future enhancements, including integration with popular Learning Management Systems and a mobile app for content generation on-the-go.
candle-vllm
Candle-vllm is an efficient and easy-to-use platform designed for inference and serving local LLMs, featuring an OpenAI compatible API server. It offers a highly extensible trait-based system for rapid implementation of new module pipelines, streaming support in generation, efficient management of key-value cache with PagedAttention, and continuous batching. The tool supports chat serving for various models and provides a seamless experience for users to interact with LLMs through different interfaces.
obsei
Obsei is an open-source, low-code, AI powered automation tool that consists of an Observer to collect unstructured data from various sources, an Analyzer to analyze the collected data with various AI tasks, and an Informer to send analyzed data to various destinations. The tool is suitable for scheduled jobs or serverless applications as all Observers can store their state in databases. Obsei is still in alpha stage, so caution is advised when using it in production. The tool can be used for social listening, alerting/notification, automatic customer issue creation, extraction of deeper insights from feedbacks, market research, dataset creation for various AI tasks, and more based on creativity.
PraisonAI
Praison AI is a low-code, centralised framework that simplifies the creation and orchestration of multi-agent systems for various LLM applications. It emphasizes ease of use, customization, and human-agent interaction. The tool leverages AutoGen and CrewAI frameworks to facilitate the development of AI-generated scripts and movie concepts. Users can easily create, run, test, and deploy agents for scriptwriting and movie concept development. Praison AI also provides options for full automatic mode and integration with OpenAI models for enhanced AI capabilities.
For similar tasks
agentops
AgentOps is a toolkit for evaluating and developing robust and reliable AI agents. It provides benchmarks, observability, and replay analytics to help developers build better agents. AgentOps is open beta and can be signed up for here. Key features of AgentOps include: - Session replays in 3 lines of code: Initialize the AgentOps client and automatically get analytics on every LLM call. - Time travel debugging: (coming soon!) - Agent Arena: (coming soon!) - Callback handlers: AgentOps works seamlessly with applications built using Langchain and LlamaIndex.
ai2apps
AI2Apps is a visual IDE for building LLM-based AI agent applications, enabling developers to efficiently create AI agents through drag-and-drop, with features like design-to-development for rapid prototyping, direct packaging of agents into apps, powerful debugging capabilities, enhanced user interaction, efficient team collaboration, flexible deployment, multilingual support, simplified product maintenance, and extensibility through plugins.
vircadia-native-core
Vircadia™ is an open source agent-based metaverse ecosystem that excels in mass human and agent (AI) based immersive worlds. It offers mobile, desktop, and VR support through the web, allows hundreds of agents simultaneously, supports full-body (human or agents), scripting with JavaScript & TypeScript, visual scripting, full world editor, 4096km³ world space in a server, fully self-hosted, and more. Vircadia is sponsored by various companies, organizations, and governments. An 'agent' in Vircadia is an AI being that shares the same space as users, interacting, speaking, and experiencing the world, used for companionship, training, and gameplay opportunities. Vircadia excels at deploying agents en-masse for a full sandbox experience.
mo-ai-studio
Mo AI Studio is an enterprise-level AI agent running platform that enables the operation of customized intelligent AI agents with system-level capabilities. It supports various IDEs and programming languages, allows modification of multiple files with reasoning, cross-project context modifications, customizable agents, system-level file operations, document writing, question answering, knowledge sharing, and flexible output processors. The platform also offers various setters and a custom component publishing feature. Mo AI Studio is a fusion of artificial intelligence and human creativity, designed to bring unprecedented efficiency and innovation to enterprises.
Nothotdog
NotHotDog is an open-source testing framework for evaluating and validating voice and text-based AI agents. It offers a user-friendly interface for creating, managing, and executing tests against AI models. The framework supports WebSocket and REST API, test case management, automated evaluation of responses, and provides a seamless experience for test creation and execution.
AutoGPT
AutoGPT is a revolutionary tool that empowers everyone to harness the power of AI. With AutoGPT, you can effortlessly build, test, and delegate tasks to AI agents, unlocking a world of possibilities. Our mission is to provide the tools you need to focus on what truly matters: innovation and creativity.
Arcade-Learning-Environment
The Arcade Learning Environment (ALE) is a simple framework that allows researchers and hobbyists to develop AI agents for Atari 2600 games. It is built on top of the Atari 2600 emulator Stella and separates the details of emulation from agent design. The ALE currently supports three different interfaces: C++, Python, and OpenAI Gym.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.