llm-structured-output

None

Stars: 69

Visit

This repository contains a library for constraining LLM generation to structured output, enforcing a JSON schema for precise data types and property names. It includes an acceptor/state machine framework, JSON acceptor, and JSON schema acceptor for guiding decoding in LLMs. The library provides reference implementations using Apple's MLX library and examples for function calling tasks. The tool aims to improve LLM output quality by ensuring adherence to a schema, reducing unnecessary output, and enhancing performance through pre-emptive decoding. Evaluations show performance benchmarks and comparisons with and without schema constraints.

README:

LLM Structured Output: JSON Schema, Function Calling, Tools

This repository contains a library to constrain LLM generation to structured output, such as function calling a.k.a. tool use.

We include examples of application implementations using the MLX library.

Differences with other approaches:

"JSON mode": this library constrains output to be valid JSON, but goes beyond JSON mode in also enforcing a JSON schema. This enables much tighter steeing: specifying data types, property names, etc.
GBNF translation: rather than converting the JSON schema to a formal grammar, we steer the output directly using the schema, which enables more flexible and deeper control with lower overhead. For example, expressing minimum and maximum array or string lengths in GBNF can lead to very large set of production rules, and certain JSON schema features are simply not possible.
Fine-tuning: our approach is complementary to fine-tuning an LLM to produce structured output. While fine-tuning currently can enhance but not guarantee adherence to a schema, our system introduces strong guarantees on the output.

Demo

https://github.com/otriscon/llm-structured-output/assets/165947759/f38704da-34b0-4601-be8b-48b92199445d

Without a schema, Mistral 7B Instruct 0.2 solves the data extraction task but, despite our instructions to the contrary, it adds a lot of additional output that's not necessary, is hard to parse, and wastes time.

https://github.com/otriscon/llm-structured-output/assets/165947759/f79a78ca-8244-4ec6-9e90-b6cdedfbb8b0

With the schema, the generation is precisely the output we require.

What's in the box

You'll find:

A framefork and set of acceptors for constraining LLM output, which are application-independent.
Reference implementations and examples using Apple's MLX library.

Framework and JSON acceptors

An acceptor/state machine framework which progresses all valid states of a given graph simultaneously. This minimizes the need for backtracking, which is expensive for LLMs as it would require re-computing past tokens. In this sense, the concept is similar to a chart parser or Earley-style recognizer and shares a similar motivation. In practice, it's quite different because we're dealing with token-level input. We implemented several optimizations to minimize combinatorial explosion: we use a trie to traverse the token vocabulary in logarithmic time, and collapse the trie branches when multiple options are equivalent. We also prune the chart by removing equivalent states arrived at by different paths. See acceptor.py.
A JSON acceptor based on the framework above that accepts valid JSON. See json_acceptor.py.
A JSON schema acceptor based on both items above that accepts valid JSON that conforms to a JSON schema. See json_schema_acceptor.py. Please note that most but not all JSON schema directives are implemented. Please open an issue if one that you need is not.

Reference implementation / examples

An example of using the acceptors above to guide decoding in an LLM using Apple's MLX framework. See llm_schema.py. This example includes several decoding techniques, including pre-emptive evaluation, which is a way to use the acceptor to anticipate the tokens that can be generated according to the schema, and use that to evaluate two tokens at a time instead of one, sometimes leading to noticeable performance improvements.
A server example that implements an OpenAI-compatible API including tools / function calling. Unlike OpenAI's, this implementation always generates valid JSON, and does not return hallucinated parameters not defined in your function schema (but it may still hallucinate their values). See server.py.

Usage

Run the examples on Apple hardware with MLX

Clone this repo:

git clone https://github.com/otriscon/llm-structured-output.git
cd llm-structured-output

Optional, but recommended: create and activate a virtual environment with your favorite tool of choice, e.g.

python -m venv .venv
source .venv/bin/activate

Move into the examples folder and install the requirements, then move back:

cd src/examples
pip install -r requirements.txt
cd ..

Choose a model from the HuggingFace MLX community, e.g. mlx-community/Meta-Llama-3.1-8B-Instruct-4bit. Models are downloaded automatically on first use and cached locally.

Run the llm_schema example:

MODEL=mlx-community/Meta-Llama-3.1-8B-Instruct-4bit

LLM_PROMPT='[INST] Parse the following address into a JSON object: "27 Barrow St, New York, NY 10014". Your answer should be only a JSON object according to this schema: {"type": "object", "properties": {"streetNumber": {"type": "number"}, "streetName": {"type": "string"}, "city": {"type": {"string"}}, "state": {"type": "string"}, "zipCode": {"type": "number"}}}. Do not explain the result, just output it. Do not add any additional information. [/INST]'

LLM_SCHEMA='{"type": "object", "properties": {"streetNumber": {"type": "number"}, "streetName": {"type": "string"}, "city": {"type": "string"}, "state": {"type": "string"}, "zipCode": {"type": "number"}}}'

python3 -m examples.llm_schema --model-path $MODEL --prompt "$LLM_PROMPT" --schema "$LLM_SCHEMA" --max-tokens 1000 --repeat-prompt

Run the server example:

MODEL_PATH=mlx-community/Meta-Llama-3.1-8B-Instruct-4bit uvicorn examples.server:app --port 8080 --reload

Try calling the server with this example adapted from the OpenAI documentation (click on the example request titled Functions):

curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "ignored",
  "messages": [
    {
      "role": "user",
      "content": "What'\''s the weather like in Boston today?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}'

Using the JSON schema acceptor in your project

Install in your project with pip install llm-structured-output and use a JsonSchemaAcceptorDriver within your normal generation loop:

import json
import mlx.core as mx
from mlx_lm.utils import load # Needs pip import mlx_lm
from llm_structured_output import JsonSchemaAcceptorDriver, HuggingfaceTokenizerHelper, bias_logits


MODEL_PATH = "mlx-community/Meta-Llama-3.1-8B-Instruct-4bit"
SCHEMA = {
    "type": "object",
    "properties": {
        "streetNumber": {"type": "number"},
        "streetName": {"type": "string"},
        "city": {"type": "string"},
        "state": {"type": "string"},
        "zipCode": {"type": "number"},
    },
}
PROMPT = f'''
[INST] Parse the following address into a JSON object: "27 Barrow St, New York, NY 10014".
Your answer should be only a JSON object according to this schema: {json.dumps(SCHEMA)}
Do not explain the result, just output it. Do not add any additional information. [/INST]
'''


# Load the model as usual.
model, tokenizer = load(MODEL_PATH)

# Instantiate a token acceptor
tokenizer_helper = HuggingfaceTokenizerHelper(tokenizer)
vocabulary, eos_id = tokenizer_helper.extract_vocabulary()
token_acceptor_factory = JsonSchemaAcceptorDriver.driver_factory_for_model(vocabulary, eos_id)
token_acceptor = token_acceptor_factory(SCHEMA)

cache = None
tokens = tokenizer_helper.encode_prompt(PROMPT)

while tokens[-1] != eos_id:
    # Evaluate the model as usual. 
    logits, cache = model(mx.array(tokens)[None], cache)

    # Set probability to -inf for invalid tokens.
    accepted_token_bitmap = token_acceptor.select_valid_tokens()
    logits = bias_logits(mx, logits[0, -1, :], accepted_token_bitmap)

    # Sample as usual, e.g.:
    token = mx.argmax(logits, axis=-1).item()

    if token == eos_id:
      break

    # Store or use the generated token.
    tokens = [token]
    text = tokenizer_helper.no_strip_decode(tokens)
    print(text, end="")

    # Advance the acceptor to the next state.
    token_acceptor.advance_token(token)

A note about guarantees on the output

Constraining the output of an LLM to follow a schema doesn't magically make the LLM great at producing output that solves a particular task.

If an LLM that is not prompted or fine-tuned correctly to solve the task, it will produce syntactically valid output but the values inside won't necessarily constitute a good solution. As with any other technique, proper LLM prompting and/or n-shot examples are crucial to avoid getting nice-looking, well-formatted, schema-compliant nonsense.

In particular, it's crucial to instruct the LLM regarding the desired output format, including making the desired schema part of the prompt. Here's an example of a prompt that includes the schema:

Parse the following address into a JSON object: "27 Barrow St, New York, NY 10014".
Your answer should be only a JSON object according to this schema: {"type": "object", "properties": {"streetNumber": {"type": "number"}, "streetName": {"type": "string"}, "city": {"type": {"string"}}, "state": {"type": "string"}, "zipCode": {"type": "number"}}}.
Do not explain the result, just output it. Do not add any additional information.

In order to give the LLM a scratch-pad prior to JSON generation for e.g. chain-of-thought reasoning, we have included an option for the acceptor to kick in only on output within a section delimited by the lines ```json and ```, with the prior output treated as free text. This is enabled with the is_encapsulated_json option of the JsonSchemaAcceptorDriver constructor. Here's an example of a prompt that produces encapsulated JSON:

Your mission is to parse the following address into a JSON object: "27 Barrow St, New York, NY 10014".
Your answer should be a JSON object according to this schema: {"type": "object", "properties": {"streetNumber": {"type": "number"}, "streetName": {"type": "string"}, "city": {"type": {"string"}}, "state": {"type": "string"}, "zipCode": {"type": "number"}}}.
First, think through the task step by step, and then output a JSON object wrapped between the lines ```json and ```.

In our OpenAI-compatible server example, when the request specifies tool_calls or a legacy function_call, we automatically prepend a system message to the prompt with the schema and instructions for the LLM to use the tools provided. If your prompt already includes these instructions (because e.g. you want to customize them), this can be disabled with a non-standard option in the request payload: "tool_options": { "no_prompt_steering": true }

Testing

The library has been tested with the following datasets:

Fireworks.ai's function calling eval dataset
ALU.AI's table extraction evaluation dataset (not yet open-source)

Evaluations

We're starting to perform evaluations to understand how well different LLMs perform in function calling tasks. The tools and data can be found in the src/tests folder.

Fireworks.ai function calling eval dataset

Environment:

llm_structured_output v0.0.15
mlx 0.14.1
2023 Mac Studio M2 Ultra 24 cores (16 performance and 8 efficiency) 192 GB RAM running macOS Sonoma 14.5
LLM: mlx-community/Meta-Llama-3-8B-Instruct-4bit
Benchmarking LLM: gpt-4o-2024-05-13

Results:

Performance

Since we need to select the acceptable tokens prior to sampling, constraining the output according to a schema introduces a delay for every token, which depends on the complexity of the schema. On the other hand, since the output is guaranteed to be valid JSON and to conform to the schema, it can reduce the number of tokens generated and reduce or eliminate the number of retries required to solve the task.

Pre-emptive decoding experiment

As an experiment to improve performance, we implement the option to use pre-emptive decoding: when the range of tokens that can be accepted after the current one is small, as often happens with structured output, we submit to the LLM a batch of two-token continuations where the first token is the one that was to be evaluated anyway, and the second token in each item in the batch is one of the possible continuations predicted according to the schema. We can then sample two tokens instead of one. We find that this approach can occasionally produce considerable increases in token generation speed, but in general it can also considerably slow it down, depending on model and quantization. We found that it works better with no fp16 models (no quantization), but batching performance degrades vastly in quantized models making pre-emptive decoding not worth it for those models.

Benchmarks

The following tests were perfomed on an Apple Studio with an M2 Ultra (24 core) with 192GB of RAM using MLX version 0.9.0, with models converted to MLX format.
The results are the average of 5 runs on a simple data extraction task with a 127-token prompt.
Pre-emptive decoding was tested in two different forms: with a constant batch size, where we always sent the same size matrices for evaluation, and variable- size batching, where we made the batch large or shorter depending on the numer of possible follow-up tokens.

Mistral-7B-v0.2-Instruct (fp16)	Prompt tps	Generation tps	Generation tokens
No schema	305.82	34.76	321
Schema	307.00	31.70	42
Pre-emptive constant batch =5	211.72	33.16	42
Pre-emptive variable batch <=5	321.85	36.53	42

Notes:

Pre-emptive decoding accelerates generation even over schemaless generation.

Mistral-7B-v0.2-Instruct (q4)	Prompt tps	Generation tps	Generation tokens
No schema	487.19	86.36	137
Schema	487.83	67.60	42
Pre-emptive constant batch =5	139.61	27.16	42
Pre-emptive variable batch <=5	488.88	36.25	42

Notes:

Pre-emptive decoding is vastly slower, with the only change being quantization.

Mixtral-8x7B-Instruct-v0.1 (fp16)	Prompt tps	Generation tps	Generation tokens
No schema	3.48	2.23	50
Schema	3.49	2.21	50
Pre-emptive constant batch =5	2.36	1.16	50
Pre-emptive variable batch <=5	3.18	1.68	50

Notes:

This is the only tested model that outputs schema-conforming output without a schema.
Pre-emptive decoding is a lot slower again.

Mixtral-8x7B-Instruct-v0.1 (q4)	Prompt tps	Generation tps	Generation tokens
No schema	15.02	32.21	165
Schema	14.94	23.75	50
Pre-emptive constant batch =5	9.29	11.28	50
Pre-emptive variable batch <=5	15.02	17.94	50

Roadmap

Extend JSON schema support as needed (see TODOs in code). Please, feel free to open an issue if you need a feature that not supported at the moment. Also open to implement additional schemas such as YAML and reference implementations for other LLMs.
Add formal test cases.
Reference implementation for the Transformers library.
Port to C++ and reference implementation for llama.cpp

For Tasks:

Click tags to check more tools for each tasks

generate structured json enforce json schema improve llm output guide decoding in llm enhance performance

For Jobs:

data scientist machine learning engineer software developer ai researcher data analyst

Alternative AI tools for llm-structured-output

Similar Open Source Tools

llm-structured-output

github

: 69

empower-functions

Empower Functions is a family of large language models (LLMs) that provide GPT-4 level capabilities for real-world 'tool using' use cases. These models offer compatibility support to be used as drop-in replacements, enabling interactions with external APIs by recognizing when a function needs to be called and generating JSON containing necessary arguments based on user inputs. This capability is crucial for building conversational agents and applications that convert natural language into API calls, facilitating tasks such as weather inquiries, data extraction, and interactions with knowledge bases. The models can handle multi-turn conversations, choose between tools or standard dialogue, ask for clarification on missing parameters, integrate responses with tool outputs in a streaming fashion, and efficiently execute multiple functions either in parallel or sequentially with dependencies.

github

: 202

ai-dev-2024-ml-workshop

The 'ai-dev-2024-ml-workshop' repository contains materials for the Deploy and Monitor ML Pipelines workshop at the AI_dev 2024 conference in Paris, focusing on deployment designs of machine learning pipelines using open-source applications and free-tier tools. It demonstrates automating data refresh and forecasting using GitHub Actions and Docker, monitoring with MLflow and YData Profiling, and setting up a monitoring dashboard with Quarto doc on GitHub Pages.

github

: 93

VedAstro

VedAstro is an open-source Vedic astrology tool that provides accurate astrological predictions and data. It offers a user-friendly website, a chat API, an open API, a JavaScript SDK, a Swiss Ephemeris API, and a machine learning table generator. VedAstro is free to use and is constantly being updated with new features and improvements.

github

: 279

WebRL

WebRL is a self-evolving online curriculum learning framework designed for training web agents in the WebArena environment. It provides model checkpoints, training instructions, and evaluation processes for training the actor and critic models. The tool enables users to generate new instructions and interact with WebArena to configure tasks for training and evaluation.

github

: 270

promptwright

Promptwright is a Python library designed for generating large synthetic datasets using local LLM and various LLM service providers. It offers flexible interfaces for generating prompt-led synthetic datasets. The library supports multiple providers, configurable instructions and prompts, YAML configuration, command line interface, push to Hugging Face Hub, and system message control. Users can define generation tasks using YAML configuration files or programmatically using Python code. Promptwright integrates with LiteLLM for LLM providers and supports automatic dataset upload to Hugging Face Hub. The library is not responsible for the content generated by models and advises users to review the data before using it in production environments.

github

: 382

promptwright

Promptwright is a Python library designed for generating large synthetic datasets using a local LLM and various LLM service providers. It offers flexible interfaces for generating prompt-led synthetic datasets. The library supports multiple providers, configurable instructions and prompts, YAML configuration for tasks, command line interface for running tasks, push to Hugging Face Hub for dataset upload, and system message control. Users can define generation tasks using YAML configuration or Python code. Promptwright integrates with LiteLLM to interface with LLM providers and supports automatic dataset upload to Hugging Face Hub.

github

: 367

otto-m8

otto-m8 is a flowchart based automation platform designed to run deep learning workloads with minimal to no code. It provides a user-friendly interface to spin up a wide range of AI models, including traditional deep learning models and large language models. The tool deploys Docker containers of workflows as APIs for integration with existing workflows, building AI chatbots, or standalone applications. Otto-m8 operates on an Input, Process, Output paradigm, simplifying the process of running AI models into a flowchart-like UI.

github

: 65

CredSweeper

CredSweeper is a tool designed to detect credentials like tokens, passwords, and API keys in directories or files. It helps users identify potential exposure of sensitive information by scanning lines, filtering, and utilizing an AI model. The tool reports lines containing possible credentials, their location, and the expected type of credential.

github

: 118

npi

NPi is an open-source platform providing Tool-use APIs to empower AI agents with the ability to take action in the virtual world. It is currently under active development, and the APIs are subject to change in future releases. NPi offers a command line tool for installation and setup, along with a GitHub app for easy access to repositories. The platform also includes a Python SDK and examples like Calendar Negotiator and Twitter Crawler. Join the NPi community on Discord to contribute to the development and explore the roadmap for future enhancements.

github

: 211

llm2vec

LLM2Vec is a simple recipe to convert decoder-only LLMs into text encoders. It consists of 3 simple steps: 1) enabling bidirectional attention, 2) training with masked next token prediction, and 3) unsupervised contrastive learning. The model can be further fine-tuned to achieve state-of-the-art performance.

github

: 1.2k

llm-web-api

LLM Web API is a tool that provides a web page to API interface for ChatGPT, allowing users to bypass Cloudflare challenges, switch models, and dynamically display supported models. It uses Playwright to control a fingerprint browser, simulating user operations to send requests to the OpenAI website and converting the responses into API interfaces. The API currently supports the OpenAI-compatible /v1/chat/completions API, accessible using OpenAI or other compatible clients.

github

: 68

trapster-community

Trapster Community is a low-interaction honeypot designed for internal networks or credential capture. It monitors and detects suspicious activities, providing deceptive security layer. Features include mimicking network services, asynchronous framework, easy configuration, expandable services, and HTTP honeypot engine with AI capabilities. Supported protocols include DNS, HTTP/HTTPS, FTP, LDAP, MSSQL, POSTGRES, RDP, SNMP, SSH, TELNET, VNC, and RSYNC. The tool generates various types of logs and offers HTTP engine with AI capabilities to emulate websites using YAML configuration. Contributions are welcome under AGPLv3+ license.

github

: 107

Noema-Declarative-AI

Noema is a framework that enables developers to control a language model and choose the path it will follow. It integrates Python with llm's generations, allowing users to use LLM as a thought interpreter rather than a source of truth. Noema is built on llama.cpp and guidance's shoulders. It applies the declarative programming paradigm to a language model, providing a way to represent functions, descriptions, and transformations. Users can create subjects, think about tasks, and generate content through generators, selectors, and code generators. Noema supports ReAct prompting, visualization, and semantic Python functionalities, offering a versatile tool for automating tasks and guiding language models.

github

: 66

redis-vl-python

The Python Redis Vector Library (RedisVL) is a tailor-made client for AI applications leveraging Redis. It enhances applications with Redis' speed, flexibility, and reliability, incorporating capabilities like vector-based semantic search, full-text search, and geo-spatial search. The library bridges the gap between the emerging AI-native developer ecosystem and the capabilities of Redis by providing a lightweight, elegant, and intuitive interface. It abstracts the features of Redis into a grammar that is more aligned to the needs of today's AI/ML Engineers or Data Scientists.

github

: 253

GPT-4V-Act

GPT-4V-Act is a multimodal AI assistant that combines GPT-4V(ision) with a web browser to mirror human operator input and output. It facilitates human-computer operations, boosts UI accessibility, aids workflow automation, and enables automated UI testing through AI labeling and set-of-marks prompting.

github

: 951

For similar tasks

llm-structured-output

github

: 69

Awesome-Model-Merging-Methods-Theories-Applications

A comprehensive repository focusing on 'Model Merging in LLMs, MLLMs, and Beyond', providing an exhaustive overview of model merging methods, theories, applications, and future research directions. The repository covers various advanced methods, applications in foundation models, different machine learning subfields, and tasks like pre-merging methods, architecture transformation, weight alignment, basic merging methods, and more.

github

: 347

optillm

optillm is an OpenAI API compatible optimizing inference proxy implementing state-of-the-art techniques to enhance accuracy and performance of LLMs, focusing on reasoning over coding, logical, and mathematical queries. By leveraging additional compute at inference time, it surpasses frontier models across diverse tasks.

github

: 2.1k

HookPHP

HookPHP is an open-source project that provides a PHP extension for hooking into various aspects of PHP applications. It allows developers to easily extend and customize the behavior of their PHP applications by providing hooks at key points in the execution flow. With HookPHP, developers can efficiently add custom functionality, modify existing behavior, and enhance the overall performance of their PHP applications. The project is licensed under the MIT license, making it accessible for developers to use and contribute to.

github

: 617

ai-gateway

Envoy AI Gateway is an open source project that utilizes Envoy Gateway to manage request traffic from application clients to Generative AI services. The project aims to provide a seamless and efficient solution for handling communication between clients and AI services. It is designed to enhance the performance and scalability of AI applications by leveraging the capabilities of Envoy Gateway. The project welcomes contributions from the community and encourages collaboration to further develop and improve the functionality of the AI Gateway.

github

: 201

aligner

Aligner is a model-agnostic alignment tool designed to efficiently correct responses from large language models. It redistributes initial answers to align with human intentions, improving performance across various LLMs. The tool can be applied with minimal training, enhancing upstream models and reducing hallucination. Aligner's 'copy and correct' method preserves the base structure while enhancing responses. It achieves significant performance improvements in helpfulness, harmlessness, and honesty dimensions, with notable success in boosting Win Rates on evaluation leaderboards.

github

: 138

codecompanion.nvim

CodeCompanion.nvim is a Neovim plugin that provides a Copilot Chat experience, adapter support for various LLMs, agentic workflows, inline code creation and modification, built-in actions for language prompts and error fixes, custom actions creation, async execution, and more. It supports Anthropic, Ollama, and OpenAI adapters. The plugin is primarily developed for personal workflows with no guarantees of regular updates or support. Users can customize the plugin to their needs by forking the project.

github

: 3.1k

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675