embodied-agents
Seamlessly integrate state-of-the-art transformer models into robotics stacks
Stars: 158
Embodied Agents is a toolkit for integrating large multi-modal models into existing robot stacks with just a few lines of code. It provides consistency, reliability, scalability, and is configurable to any observation and action space. The toolkit is designed to reduce complexities involved in setting up inference endpoints, converting between different model formats, and collecting/storing datasets. It aims to facilitate data collection and sharing among roboticists by providing Python-first abstractions that are modular, extensible, and applicable to a wide range of tasks. The toolkit supports asynchronous and remote thread-safe agent execution for maximal responsiveness and scalability, and is compatible with various APIs like HuggingFace Spaces, Datasets, Gymnasium Spaces, Ollama, and OpenAI. It also offers automatic dataset recording and optional uploads to the HuggingFace hub.
README:
📖 Docs: docs
🚀 Simple Robot Agent Example: 💻 Simulation Example with SimplerEnv: 🤖 Motor Agent using OpenVLA: ⏺️ Record Dataset on a Robot
Updates:
Aug 28 2024, embodied-agents v1.2
- New Doc site is up!
- Added the features to record dataset on robot natively.
- Add multiple new Sensory Agents, i.e. depth estimation, object detection, image segmentation with public API endpoints hosted. And a simple cli
mbodied
for trying them. - Added Auto Agent for dynamic agents selection.
June 30 2024, embodied-agents v1.0:
- Added Motor Agent supporting OpenVLA with free API endpoint hosted.
- Added Sensory Agent supporting i.e. 3D object pose detection.
- Improved automatic dataset recording.
- Agent now can make remote act calls to API servers i.e. Gradio, vLLM.
- Bug fixes and performance improvements have been made.
- PyPI project is renamed to
mbodied
.
embodied agents is a toolkit for integrating large multi-modal models into existing robot stacks with just a few lines of code. It provides consistency, reliability, scalability and is configurable to any observation and action space.
This repository is broken down into 3 main components: Agents, Data, and Hardware. Inspired by the efficiency of the central nervous system, each component is broken down into 3 meta-modalities: Language, Motion, and Sense. Each agent has an act
method that can be overridden and satisfies:
- Language Agents always return a string.
-
Motor Agents always return a
Motion
. -
Sensory Agents always return a
SensorReading
.
For convenience, we also provide AutoAgent which dynamically initializes the right agent for the specified task. See API Reference below for more.
A call to act
or async_act
can perform local or remote inference synchronously or asynchronously. Remote execution can be performed with Gradio, httpx, or different LLM clients. Validation is performed with Pydantic.
- Language Agents natively support OpenAI, Anthropic, Ollama, vLLM, Gradio, etc
- Motor Agents natively support OpenVLA, RT1(upcoming)
- Sensory Agents support Depth Anything, YOLO, Segment Anything 2
Jump to getting started to get up and running on real hardware or simulation. Be sure to join our Discord for 🥇-winning discussions :)
⭐ Give us a star on GitHub if you like us!
There is a signifcant barrier to entry for running SOTA models in robotics
It is currently unrealistic to run state-of-the-art AI models on edge devices for responsive, real-time applications. Furthermore, the complexity of integrating multiple models across different modalities is a significant barrier to entry for many researchers, hobbyists, and developers. This library aims to address these challenges by providing a simple, extensible, and efficient way to integrate large models into existing robot stacks.
Facillitate data-collection and sharing among roboticists.
This requires reducing much of the complexities involved with setting up inference endpoints, converting between different model formats, and collecting and storing new datasets for future availibility.
We aim to achieve this by:
- Providing simple, Python-first abstrations that are modular, extensible and applicable to a wide range of tasks.
- Providing endpoints, weights, and interactive Gradio playgrounds for easy access to state-of-the-art models.
- Ensuring that this library is observation and action-space agnostic, allowing it to be used with any robot stack.
Beyond just improved robustness and consistency, this architecture makes asynchronous and remote agent execution exceedingly simple. In particular we demonstrate how responsive natural language interactions can be achieved in under 30 lines of Python code.
Embodied Agents are not yet capable of learning from in-context experience:
- Frameworks for advanced RAG techniques are clumsy at best for OOD embodied applications however that may improve.
- Amount of data required for fine-tuning is still prohibitively large and expensive to collect.
- Online RL is still in its infancy and not yet practical for most applications.
- This library is intended to be used for research and prototyping.
- This library is still experimental and under active development. Breaking changes may occur although they will be avoided as much as possible. Feel free to report issues!
- Extensible, user-friendly python SDK with explicit typing and modularity
- Asynchronous and remote thread-safe agent execution for maximal responsiveness and scalability.
- Full-compatiblity with HuggingFace Spaces, Datasets, Gymnasium Spaces, Ollama, and any OpenAI-compatible api.
- Automatic dataset-recording and optionally uploads dataset to huggingface hub.
- Closed: OpenAI, Anthropic
- Open Weights: OpenVLA, Idefics2, Llava-1.6-Mistral, Phi-3-vision-128k-instruct
- All gradio endpoints hosted on HuggingFace spaces.
- [x] OpenVLA Motor Agent
- [x] Automatic dataset recording on Robot
- [x] Yolo, SAM2, DepthAnything Sensory Agents
- [x] Auto Agent
- [ ] ROS integration
- [ ] More Motor Agents, i.e. RT1
- [ ] More device support, i.e. OpenCV camera
- [ ] Fine-tuning Scripts
pip install mbodied
# With extra dependencies, i.e. torch, opencv-python, etc.
pip install mbodied[extras]
# For audio support
pip install mbodied[audio]
Or install from source:
pip install git+https://github.com/mbodiai/embodied-agents.git
from mbodied.types.motion.control import HandControl, FullJointControl
from mbodied.types.motion import AbsoluteMotionField, RelativeMotionField
class FineGrainedHandControl(HandControl):
comment: str = Field(None, description="A comment to voice aloud.")
index: FullJointControl = AbsoluteMotionField([0,0,0],bounds=[-3.14, 3.14], shape=(3,))
thumb: FullJointControl = RelativeMotionField([0,0,0],bounds=[-3.14, 3.14], shape=(3,))
import os
from mbodied.agents import LanguageAgent
from mbodied.agents.motion import OpenVlaAgent
from mbodied.agents.sense.audio import AudioAgent
from mbodied.robots import SimRobot
cognition = LanguageAgent(
context="You are an embodied planner that responds with a python list of strings and nothing else.",
api_key=os.getenv("OPENAI_API_KEY"),
model_src="openai",
recorder="auto",
)
audio = AudioAgent(use_pyaudio=False, api_key=os.getenv("OPENAI_API_KEY")) # pyaudio is buggy on mac
motion = OpenVlaAgent(model_src="https://api.mbodi.ai/community-models/")
# Subclass and override do() and capture() methods.
robot = SimRobot()
instruction = audio.listen()
plan = cognition.act(instruction, robot.capture())
for step in plan.strip('[]').strip().split(','):
print("\nMotor agent is executing step: ", step, "\n")
for _ in range(10):
hand_control = motion.act(step, robot.capture())
robot.do(hand_control)
Example Scripts:
- 1_simple_robot_agent.py: A very simple language based cognitive agent taking instruction from user and output voice and actions.
- 2_openvla_motor_agent_example.py: Run robotic transformers, i.e. OpenVLA, in several lines on the robot.
- 3_reason_plan_act_robot.py: Full example of language based cognitive agent and OpenVLA motor agent executing task.
- 4_language_reason_plan_act_robot.py: Full example of all languaged based cognitive and motor agent executing task.
- 5_teach_robot_record_dataset.py: Example of collecting dataset on robot's action at a specific frequency by just yelling at the robot!
Simulation with: SimplerEnv :
Run OpenVLA with embodied-agents in simulation:
The Sample Class
The Sample class is a base model for serializing, recording, and manipulating arbitrary data. It is designed to be extendable, flexible, and strongly typed. By wrapping your observation or action objects in the Sample class, you'll be able to convert to and from the following with ease:
- A Gym space for creating a new Gym environment.
- A flattened list, array, or tensor for plugging into an ML model.
- A HuggingFace dataset with semantic search capabilities.
- A Pydantic BaseModel for reliable and quick json serialization/deserialization.
To learn more about all of the possibilities with embodied agents, check out the documentation
- You can
pack
a list ofSample
s or Dicts into a singleSample
orDict
andunpack
accordingly? - You can
unflatten
any python structure into aSample
class so long you provide it with a valid json schema?
Creating a Sample requires just wrapping a python dictionary with the Sample
class. Additionally, they can be made from kwargs, Gym Spaces, and Tensors to name a few.
from mbodied.types.sample import Sample
# Creating a Sample instance
sample = Sample(observation=[1,2,3], action=[4,5,6])
# Flattening the Sample instance
flat_list = sample.flatten()
print(flat_list) # Output: [1, 2, 3, 4, 5, 6]
# Generating a simplified JSON schema
>>> schema = sample.schema()
{'type': 'object', 'properties': {'observation': {'type': 'array', 'items': {'type': 'integer'}}, 'action': {'type': 'array', 'items': {'type': 'integer'}}}}
# Unflattening a list into a Sample instance
Sample.unflatten(flat_list, schema)
>>> Sample(observation=[1, 2, 3], action=[4, 5, 6])
The Sample class leverages Pydantic's powerful features for serialization and deserialization, allowing you to easily convert between Sample instances and JSON.
# Serialize the Sample instance to JSON
sample = Sample(observation=[1,2,3], action=[4,5,6])
json_data = sample.model_dump_json()
print(json_data) # Output: '{"observation": [1, 2, 3], "action": [4, 5, 6]}'
# Deserialize the JSON data back into a Sample instance
json_data = '{"observation": [1, 2, 3], "action": [4, 5, 6]}'
sample = Sample.model_validate(from_json(json_data))
print(sample) # Output: Sample(observation=[1, 2, 3], action=[4, 5, 6])
# Converting to a dictionary
sample_dict = sample.to("dict")
print(sample_dict) # Output: {'observation': [1, 2, 3], 'action': [4, 5, 6]}
# Converting to a NumPy array
sample_np = sample.to("np")
print(sample_np) # Output: array([1, 2, 3, 4, 5, 6])
# Converting to a PyTorch tensor
sample_pt = sample.to("pt")
print(sample_pt) # Output: tensor([1, 2, 3, 4, 5, 6])
gym_space = sample.space()
print(gym_space)
# Output: Dict('action': Box(-inf, inf, (3,), float64), 'observation': Box(-inf, inf, (3,), float64))
See sample.py for more details.
The Message class represents a single completion sample space. It can be text, image, a list of text/images, Sample, or other modality. The Message class is designed to handle various types of content and supports different roles such as user, assistant, or system.
You can create a Message
in versatile ways. They can all be understood by mbodi's backend.
from mbodied.types.message import Message
Message(role="user", content="example text")
Message(role="user", content=["example text", Image("example.jpg"), Image("example2.jpg")])
Message(role="user", content=[Sample("Hello")])
The Backend class is an abstract base class for Backend implementations. It provides the basic structure and methods required for interacting with different backend services, such as API calls for generating completions based on given messages. See backend directory on how various backends are implemented.
Agent is the base class for various agents listed below. It provides a template for creating agents that can talk to a remote backend/server and optionally record their actions and observations.
The Language Agent can connect to different backends or transformers of your choice. It includes methods for recording conversations, managing context, looking up messages, forgetting messages, storing context, and acting based on an instruction and an image.
Natively supports API services: OpenAI, Anthropic, vLLM, Ollama, HTTPX, or any gradio endpoints. More upcoming!
To use OpenAI for your robot backend:
from mbodied.agents.language import LanguageAgent
agent = LanguageAgent(context="You are a robot agent.", model_src="openai")
To execute an instruction:
instruction = "pick up the fork"
response = robot_agent.act(instruction, image)
Language Agent can connect to vLLM as well. For example, suppose you are running a vLLM server Mistral-7B on 1.2.3.4:1234. All you need to do is:
agent = LanguageAgent(
context=context,
model_src="openai",
model_kwargs={"api_key": "EMPTY", "base_url": "http://1.2.3.4:1234/v1"},
)
response = agent.act("Hello, how are you?", model="mistralai/Mistral-7B-Instruct-v0.3")
Example using Ollama:
agent = LanguageAgent(
context="You are a robot agent.", model_src="ollama",
model_kwargs={"endpoint": "http://localhost:11434/api/chat"}
)
response = agent.act("Hello, how are you?", model="llama3.1")
Motor Agent is similar to Language Agent but instead of returning a string, it always returns a Motion
. Motor Agent is generally powered by robotic transformer models, i.e. OpenVLA, RT1, Octo, etc.
Some small model, like RT1, can run on edge devices. However, some, like OpenVLA, may be challenging to run without quantization. See OpenVLA Agent and an example OpenVLA server
These agents interact with the environment to collect sensor data. They always return a SensorReading
, which can be various forms of processed sensory input such as images, depth data, or audio signals.
Currently, we have:
agents that process robot's sensor information.
Auto Agent dynamically selects and initializes the correct agent based on the task and model.
from mbodied.agents.auto.auto_agent import AutoAgent
# This makes it a LanguageAgent
agent = AutoAgent(task="language", model_src="openai")
response = agent.act("What is the capital of France?")
# This makes it a motor agent: OpenVlaAgent
auto_agent = AutoAgent(task="motion-openvla", model_src="https://api.mbodi.ai/community-models/")
action = auto_agent.act("move hand forward", Image(size=(224, 224)))
# This makes it a sensory agent: DepthEstimationAgent
auto_agent = AutoAgent(task="sense-depth-estimation", model_src="https://api.mbodi.ai/sense/")
depth = auto_agent.act(image=Image(size=(224, 224)))
Alternatively, you can use get_agent
method in auto_agent as well.
language_agent = get_agent(task="language", model_src="openai")
The motion_controls module defines various motions to control a robot as Pydantic models. They are also subclassed from Sample
, thus possessing all the capability of Sample
as mentioned above. These controls cover a range of actions, from simple joint movements to complex poses and full robot control.
You can integrate your custom robot hardware by subclassing Robot quite easily. You only need to implement do()
function to perform actions (and some additional methods if you want to record dataset on the robot). In our examples, we use a mock robot. We also have an XArm robot as an example.
Recording a dataset on a robot is very easy! All you need to do is implement the get_observation()
, get_state()
, and prepare_action()
methods for your robot. After that, you can record a dataset on your robot anytime you want. See examples/5_teach_robot_record_dataset.py and this colab: for more details.
from mbodied.robots import SimRobot
from mbodied.types.motion.control import HandControl, Pose
robot = SimRobot()
robot.init_recorder(frequency_hz=5)
with robot.record("pick up the fork"):
motion = HandControl(pose=Pose(x=0.1, y=0.2, z=0.3, roll=0.1, pitch=0.2, yaw=0.3))
robot.do(motion)
Dataset Recorder is a lower level recorder to record your conversation and the robot's actions to a dataset as you interact with/teach the robot. You can define any observation space and action space for the Recorder. See gymnasium for more details about spaces.
from mbodied.data.recording import Recorder
from mbodied.types.motion.control import HandControl
from mbodied.types.sense.vision import Image
from gymnasium import spaces
observation_space = spaces.Dict({
'image': Image(size=(224, 224)).space(),
'instruction': spaces.Text(1000)
})
action_space = HandControl().space()
recorder = Recorder('example_recorder', out_dir='saved_datasets', observation_space=observation_space, action_space=action_space)
# Every time robot makes a conversation or performs an action:
recorder.record(observation={'image': image, 'instruction': instruction,}, action=hand_control)
The dataset is saved to ./saved_datasets
.
The Replayer class is designed to process and manage data stored in HDF5 files generated by Recorder
. It provides a variety of functionalities, including reading samples, generating statistics, extracting unique items, and converting datasets for use with HuggingFace. The Replayer also supports saving specific images during processing and offers a command-line interface for various operations.
Example for iterating through a dataset from Recorder with Replayer:
from mbodied.data.replaying import Replayer
replayer = Replayer(path=str("path/to/dataset.h5"))
for observation, action in replayer:
...
├─ assets/ ............. Images, icons, and other static assets
├─ examples/ ........... Example scripts and usage demonstrations
├─ resources/ .......... Additional resources for examples
├─ src/
│ └─ mbodied/
│ ├─ agents/ ....... Modules for robot agents
│ │ ├─ backends/ .. Backend implementations for different services for agents
│ │ ├─ language/ .. Language based agents modules
│ │ ├─ motion/ .... Motion based agents modules
│ │ └─ sense/ ..... Sensory, e.g. audio, processing modules
│ ├─ data/ ......... Data handling and processing
│ ├─ hardware/ ..... Hardware modules, i.e. camera
│ ├─ robot/ ........ Robot interface and interaction
│ └─ types/ ........ Common types and definitions
└─ tests/ .............. Unit tests
We welcome issues, questions and PRs. See the contributing guide for more information.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for embodied-agents
Similar Open Source Tools
embodied-agents
Embodied Agents is a toolkit for integrating large multi-modal models into existing robot stacks with just a few lines of code. It provides consistency, reliability, scalability, and is configurable to any observation and action space. The toolkit is designed to reduce complexities involved in setting up inference endpoints, converting between different model formats, and collecting/storing datasets. It aims to facilitate data collection and sharing among roboticists by providing Python-first abstractions that are modular, extensible, and applicable to a wide range of tasks. The toolkit supports asynchronous and remote thread-safe agent execution for maximal responsiveness and scalability, and is compatible with various APIs like HuggingFace Spaces, Datasets, Gymnasium Spaces, Ollama, and OpenAI. It also offers automatic dataset recording and optional uploads to the HuggingFace hub.
GraphRAG-SDK
Build fast and accurate GenAI applications with GraphRAG SDK, a specialized toolkit for building Graph Retrieval-Augmented Generation (GraphRAG) systems. It integrates knowledge graphs, ontology management, and state-of-the-art LLMs to deliver accurate, efficient, and customizable RAG workflows. The SDK simplifies the development process by automating ontology creation, knowledge graph agent creation, and query handling, enabling users to interact and query their knowledge graphs effectively. It supports multi-agent systems and orchestrates agents specialized in different domains. The SDK is optimized for FalkorDB, ensuring high performance and scalability for large-scale applications. By leveraging knowledge graphs, it enables semantic relationships and ontology-driven queries that go beyond standard vector similarity, enhancing retrieval-augmented generation capabilities.
IntelliNode
IntelliNode is a javascript module that integrates cutting-edge AI models like ChatGPT, LLaMA, WaveNet, Gemini, and Stable diffusion into projects. It offers functions for generating text, speech, and images, as well as semantic search, multi-model evaluation, and chatbot capabilities. The module provides a wrapper layer for low-level model access, a controller layer for unified input handling, and a function layer for abstract functionality tailored to various use cases.
Endia
Endia is a dynamic Array library for Scientific Computing, offering automatic differentiation of arbitrary order, complex number support, dual API with PyTorch-like imperative or JAX-like functional interface, and JIT Compilation for speeding up training and inference. It can handle complex valued functions, perform both forward and reverse-mode automatic differentiation, and has a builtin JIT compiler. Endia aims to advance AI & Scientific Computing by pushing boundaries with clear algorithms, providing high-performance open-source code that remains readable and pythonic, and prioritizing clarity and educational value over exhaustive features.
llmgraph
llmgraph is a tool that enables users to create knowledge graphs in GraphML, GEXF, and HTML formats by extracting world knowledge from large language models (LLMs) like ChatGPT. It supports various entity types and relationships, offers cache support for efficient graph growth, and provides insights into LLM costs. Users can customize the model used and interact with different LLM providers. The tool allows users to generate interactive graphs based on a specified entity type and Wikipedia link, making it a valuable resource for knowledge graph creation and exploration.
Vitron
Vitron is a unified pixel-level vision LLM designed for comprehensive understanding, generating, segmenting, and editing static images and dynamic videos. It addresses challenges in existing vision LLMs such as superficial instance-level understanding, lack of unified support for images and videos, and insufficient coverage across various vision tasks. The tool requires Python >= 3.8, Pytorch == 2.1.0, and CUDA Version >= 11.8 for installation. Users can deploy Gradio demo locally and fine-tune their models for specific tasks.
inspectus
Inspectus is a versatile visualization tool for large language models. It provides multiple views, including Attention Matrix, Query Token Heatmap, Key Token Heatmap, and Dimension Heatmap, to offer insights into language model behaviors. Users can interact with the tool in Jupyter notebooks through an easy-to-use Python API. Inspectus allows users to visualize attention scores between tokens, analyze how tokens focus on each other during processing, and explore the relationships between query and key tokens. The tool supports the visualization of attention maps from Huggingface transformers and custom attention maps, making it a valuable resource for researchers and developers working with language models.
hydraai
Generate React components on-the-fly at runtime using AI. Register your components, and let Hydra choose when to show them in your App. Hydra development is still early, and patterns for different types of components and apps are still being developed. Join the discord to chat with the developers. Expects to be used in a NextJS project. Components that have function props do not work.
qa-mdt
This repository provides an implementation of QA-MDT, integrating state-of-the-art models for music generation. It offers a Quality-Aware Masked Diffusion Transformer for enhanced music generation. The code is based on various repositories like AudioLDM, PixArt-alpha, MDT, AudioMAE, and Open-Sora. The implementation allows for training and fine-tuning the model with different strategies and datasets. The repository also includes instructions for preparing datasets in LMDB format and provides a script for creating a toy LMDB dataset. The model can be used for music generation tasks, with a focus on quality injection to enhance the musicality of generated music.
flow-prompt
Flow Prompt is a dynamic library for managing and optimizing prompts for large language models. It facilitates budget-aware operations, dynamic data integration, and efficient load distribution. Features include CI/CD testing, dynamic prompt development, multi-model support, real-time insights, and prompt testing and evolution.
BentoML
BentoML is an open-source model serving library for building performant and scalable AI applications with Python. It comes with everything you need for serving optimization, model packaging, and production deployment.
premsql
PremSQL is an open-source library designed to help developers create secure, fully local Text-to-SQL solutions using small language models. It provides essential tools for building and deploying end-to-end Text-to-SQL pipelines with customizable components, ideal for secure, autonomous AI-powered data analysis. The library offers features like Local-First approach, Customizable Datasets, Robust Executors and Evaluators, Advanced Generators, Error Handling and Self-Correction, Fine-Tuning Support, and End-to-End Pipelines. Users can fine-tune models, generate SQL queries from natural language inputs, handle errors, and evaluate model performance against predefined metrics. PremSQL is extendible for customization and private data usage.
SUPIR
SUPIR is an AI-based image processing and upscaling tool that leverages cutting-edge technology to enhance image quality and resolution. The tool provides users with the ability to upscale images with high generalization and quality, as well as specific settings for light degradation scenarios. It offers a range of models and checkpoints for different use cases, along with detailed instructions for installation and usage. SUPIR also includes features for color fixing, linear CFG adjustments, and various prompts for image enhancement. The tool is designed for non-commercial use only and comes with a contact email for inquiries and permission requests for commercial use.
OpenMusic
OpenMusic is a repository providing an implementation of QA-MDT, a Quality-Aware Masked Diffusion Transformer for music generation. The code integrates state-of-the-art models and offers training strategies for music generation. The repository includes implementations of AudioLDM, PixArt-alpha, MDT, AudioMAE, and Open-Sora. Users can train or fine-tune the model using different strategies and datasets. The model is well-pretrained and can be used for music generation tasks. The repository also includes instructions for preparing datasets, training the model, and performing inference. Contact information is provided for any questions or suggestions regarding the project.
OpenAdapt
OpenAdapt is an open-source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web Graphical User Interfaces (GUIs). It aims to automate repetitive GUI workflows by leveraging the power of LMMs. OpenAdapt records user input and screenshots, converts them into tokenized format, and generates synthetic input via transformer model completions. It also analyzes recordings to generate task trees and replay synthetic input to complete tasks. OpenAdapt is model agnostic and generates prompts automatically by learning from human demonstration, ensuring that agents are grounded in existing processes and mitigating hallucinations. It works with all types of desktop GUIs, including virtualized and web, and is open source under the MIT license.
KaibanJS
KaibanJS is a JavaScript-native framework for building multi-agent AI systems. It enables users to create specialized AI agents with distinct roles and goals, manage tasks, and coordinate teams efficiently. The framework supports role-based agent design, tool integration, multiple LLMs support, robust state management, observability and monitoring features, and a real-time agentic Kanban board for visualizing AI workflows. KaibanJS aims to empower JavaScript developers with a user-friendly AI framework tailored for the JavaScript ecosystem, bridging the gap in the AI race for non-Python developers.
For similar tasks
embodied-agents
Embodied Agents is a toolkit for integrating large multi-modal models into existing robot stacks with just a few lines of code. It provides consistency, reliability, scalability, and is configurable to any observation and action space. The toolkit is designed to reduce complexities involved in setting up inference endpoints, converting between different model formats, and collecting/storing datasets. It aims to facilitate data collection and sharing among roboticists by providing Python-first abstractions that are modular, extensible, and applicable to a wide range of tasks. The toolkit supports asynchronous and remote thread-safe agent execution for maximal responsiveness and scalability, and is compatible with various APIs like HuggingFace Spaces, Datasets, Gymnasium Spaces, Ollama, and OpenAI. It also offers automatic dataset recording and optional uploads to the HuggingFace hub.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.