syncode

Efficient and general syntactical decoding for Large Language Models

Stars: 225

Visit

SynCode is a novel framework for the grammar-guided generation of Large Language Models (LLMs) that ensures syntactically valid output with respect to defined Context-Free Grammar (CFG) rules. It supports general-purpose programming languages like Python, Go, SQL, JSON, and more, allowing users to define custom grammars using EBNF syntax. The tool compares favorably to other constrained decoders and offers features like fast grammar-guided generation, compatibility with HuggingFace Language Models, and the ability to work with various decoding strategies.

README:

SynCode: LLM Generation with Grammar Augmentation

ℹ️ About

SynCode is a novel framework for the grammar-guided generation of Large Language Models (LLMs) that is scalable to general-purpose programming languages and has soundness and completeness guarantees.
With SynCode, you can ensure that your Language Model is generating output that is syntactically valid with respect to the rules defined by a Context-Free Grammar (CFG).
For example, SynCode gets 99% accuracy in JSON generation with Gemma-2b (check here) and is 10-20% faster than standard unconstrained generation

Builtin Grammars

Check Grammars directory for supported grammars

Define your own grammar using simple EBNF syntax. Check out our notebooks directory for examples and a quick example at

📚 Features


🔥 Fast grammar-guided generation (as little as 10% generation overhead with Python and Go!)
🤖 Seamlessly work with any HuggingFace Language Model, including Code, Chat, and Instruct models
🖍️ Pass in any CFG in the EBNF format (even large grammars for programming languages like Python and Go!)
📝 Built-in CFGs for Python, Go, SQL, Math, JSON, and more!
🎲 Sample with any existing decoding strategy (eg. greedy, beam search, nucleus sampling)

📖 More About SynCode

How SynCode works?

In the SynCode workflow, the LLM takes partial code C_k and generates a distribution for the next token t_k+1. The incremental parser processes C_k to generate accept sequences A, the sequences of terminals that can follow partial code called accept sequences. Simultaneously, the incremental parser computes a remainder r from the partial code, representing the suffix that may change its terminal type in subsequent generations. The backbone of SynCode is the offline construction of a DFA mask store, a lookup table derived from regular expressions representing the terminals of the language grammar. The DFA mask store facilitates efficient traversal of DFA states, enabling the retrieval of masks mapped to each state and accept sequence. SynCode walks over the DFA using the remainder and uses the mask store to compute the mask specific to each accept sequence. By unifying masks for each accept sequence SynCode gets the set of syntactically valid tokens. The LLM iteratively generates a token t_k+1 using the distribution and the mask, appending it to C_k to create the updated code C_k+1. The process continues until the LLM returns the final code C_n based on the defined stop condition.

🚀 Quick Start

Python Installation and Usage Instructions

Simply install SynCode via PyPi using the following command:

pip install git+https://github.com/uiuc-focal-lab/syncode.git

Note: SynCode depends on HuggingFace transformers:

SynCode version	Recommended transformers version
`v0.1.4` (latest)	`v4.44.0`
`v0.1.2`	`v4.42.0`

Usage option 1:

SynCode can be used as a simple logit processor with HuggingFace transformers library interface. Check this notebook for example.

Just import with and initialize it with the appropriate grammar

from syncode import SyncodeLogitsProcessor

and this can be passed as an argument to generate function. For example,

output = model.generate(
          inputs,
          max_length=100, 
          pad_token_id=tokenizer.eos_token_id, 
          logits_processor=[syncode_logits_processor]
        )

Usage option 2:

The other option is to use the SynCode object for inference (this comes with additional optimizations),

from syncode import Syncode

Refer to SynCode Arguments for the full list of arguments to initialize the SynCode class. In Python, inference is performed using the infer() method in the SynCode class. infer() has the following arguments:

prompt (str, optional): Prompt to the Language Model. Defaults to None.
task_id (int, optional): Problem task id for selecting a problem from a Dataset. Defaults to None.

If both prompt and task_id are not specified, infer() reads user input via stdin.

The following example shows the benefit of SynCode:

In the example below, the unconstrained original Phi-2 model fails to generate a valid JSON object and instead generates Python code.

from syncode import Syncode

# Load the unconstrained original model
llm = Syncode(model="microsoft/phi-2", mode='original', max_new_tokens=50)

prompt = "Please return a JSON object to represent the country India with name, capital, and population?"
output = llm.infer(prompt)[0]
print(f"LLM output:\n{output}\n")

# LLM output:
#
# A:
#
# You can use the following code:
# import json
#
# def get_country_info(country_name):
#    country_info = {
#        'name': country_name,
#        'capital':

When guided with the JSON grammar with SynCode, the model can generate a syntactically valid JSON object.

from syncode import Syncode

# Load the Syncode augmented model
syn_llm = Syncode(model = "microsoft/phi-2", grammar='json', parse_output_only=True, max_new_tokens=50)

prompt = "Please return a JSON object to represent the country India with name, capital, and population?"
output = syn_llm.infer(prompt)[0]
print(f"SynCode output:\n{output}")

# SynCode output:
# {
#     "name": "India",
#     "capital": "New Delhi",
#     "population": "1,366,417,754"
# }

Check more examples of using Python, Go, and other grammars in Notebooks and a quick example at

Instuct-tuned Models

SynCode can also be used with Instruct-tuned models. For example, the following code snippet shows how to use SynCode with the Instruct-tuned LLM model.

messages = [
    {"role": "system", "content": "You are a chatbot who always returns a JSON object."},
    {"role": "user", "content": "can you give me a JSON object describing University of Illinois at Urbana-Champaign?"},
]

out = syn_llm.infer(messages)

See the notebook for example.

Environment Variables

Optionally, you can set the directories for cache by exporting the following environment variables. Add the following lines to your .bashrc or .zshrc file:

export HF_CACHE="path_to_hf_cache"
export SYNCODE_CACHE="path_to_syncode_cache"

If these environment variables are not set, the tool will use the default cache directories. To use the gated models on HuggingFace such as Llamma models, you can set the environment variable HF_ACCESS_TOKEN

export HF_ACCESS_TOKEN="your_huggingface_api_key"

SynCode Arguments

Click to Expand on the List of Arguments for SynCode

mode (str, optional): Mode for inference. grammar_mask and grammar_strict are the modes that enable our tool. original is the mode for the original LLM. Defaults to "grammar_strict". "original" mode are used for the original LLM without any grammar constraints and "grammar_strict" mode is a stricter mode for a grammar-constrained generation.
model (str): Model ID for Hugging Face model hub or model name if stored locally.
quantize (bool, optional): Quantize the model to bfloat16. Defaults to True.
device (str, optional): The device on which the model is run. Defaults to cuda.
grammar (str, optional): Grammar in EBNF form (string or file path) or language for constrained generation. Defaults to None. You can use one of the python, go, sql, json, java, calc or pass in a custom grammar (check notebooks for examples) in EBNF format.
num_samples (int, optional): Number of samples. Defaults to 1.
dataset (str, optional): Dataset. Defaults to "input". "input" indicates that the user can provide input via CLI or by passing in a prompt as a string.
num_few_shot (int, optional): Number of examples for few-shot prompting. Defaults to 0.
dev_mode (bool, optional): Development mode where we do not fail silently with parser errors. Defaults to False.
log_level (int, optional): 0 for no logs, 1 for minimal logs, 2 for all logs including time. Defaults to 2.
new_mask_store (bool, optional): Forces to use a new mask store otherwise use a cached mask store if available. Defaults to False.
parser (str, optional): Choose between LR(1) and LALR(1) parsing. Defaults to 'lalr'.
task_id (int, optional): Problem task id for selecting a problem from a Dataset.
kwargs(void, optional): Currently supported kwargs are max_length, max_new_tokens, min_length, min_new_tokens, early_stopping, do_sample, num_beams, use_cache, temperature, top_k, top_p, num_return_sequences, pad_token_id, and eos_token_id. Refer to the HuggingFace Text Generation Documentation for more information.

Running with CLI

Running SynCode via CLI

Clone this repository:

git clone https://github.com/uiuc-focal-lab/syncode.git

To run the tool with CLI, use the following command:

python3 syncode/infer.py
    --mode [original, grammar_mask, grammar_strict]
    --model [model_name]
    --quantize [True, False]
    --device ["cpu", "cuda", "cuda:1" etc.]
    --num_samples [num_samples]
    --dataset [mbxp, humaneval, mathqa-x, input]
    --few_shot [True, False]
    --num_fs_examples [num_fs_examples]
    --dev_mode [True, False]
    --log_level [0, 1, 2]
    --new_mask_store [True, False]
    --parser ["lr", "lalr"]
    --task_id [task_id]

👀 Example Usage

Check out our notebooks directory which contains various interactive examples that showcase different use cases of SynCode! The grammars for some common programming languages are defined in the grammars directory. We also allow users to define a grammar using a simple EBNF syntax adapted from Lark. Users can pass in a string of rules or a path to a .lark file.

🐍 Generate Indentation-Error-Free Python Code

Large Language Models tend to struggle with generating Python code with correct indentation. Consider the example below. The unconstrained original WizardCoder model fails to generate a code completion with the correct number of spaces. When executing this code, we get an Indentation Error.

from syncode import Syncode

model_name = "WizardLM/WizardCoder-1B-V1.0"

# Load the unconstrained original model
llm = Syncode(model = model_name, mode='original', max_new_tokens=200)
partial_code = "def is_prime(n):\n    '''Return if prime'''\n  "

#generate a completion to the input partial code
unconstrained_output = partial_code+llm.infer(partial_code)[0]

print(unconstrained_output)
# def is_prime(n):
#     '''Return if prime'''
#    if n < 2:
#        return False
#    for i in range(2, int(n**0.5)+1):
#        if n % i == 0:
#            return False
#    return True
exec(unconstrained_output)
# IndentationError: unindent does not match any outer indentation level

SynCode can fix this problem! We simply switch the mode to grammar_mask/grammar_strict to load the SynCode augmented model. With the constrained decoding of SynCode, the LLM is able to generate a correct Python program.

from syncode import Syncode

model_name = "WizardLM/WizardCoder-1B-V1.0"

# Load the Syncode augmented model
syn_llm = Syncode(model=model_name, mode='grammar_strict', grammar='python', parse_output_only=False)
partial_code = "def is_prime(n):\n    '''Return if prime'''\n  "

#generate a completion to the input partial code
constrained_output = partial_code+ syn_llm.infer(partial_code)[0]
print(constrained_output)
# def is_prime(n):
#     '''Return if prime'''
#     if n < 2:
#         return False
#     for i in range(2, int(n**0.5) + 1):
#         if n % i == 0:
#             return False
#     return True
exec(constrained_output)
# Correct Code :)

🔤 JSON Mode Generation

In the example below, the unconstrained original Phi-2 model fails to generate a valid JSON object and instead generates Python code.

from syncode import Syncode

# Load the unconstrained original model
llm = Syncode(model = "microsoft/phi-2", mode='original', max_new_tokens=50)

prompt = "Please return a json object to represent country India with name, capital and population?"
output = llm.infer(prompt)[0]
print(f"LLM output:\n{output}\n")

# LLM output:
#
# A:
#
# You can use the following code:
# import json
#
# def get_country_info(country_name):
#    country_info = {
#        'name': country_name,
#        'capital':

When guided with the JSON grammar with SynCode, the model is able to generate a syntactically valid JSON object.

from syncode import Syncode

# Load the Syncode augmented model
syn_llm = Syncode(model="microsoft/phi-2", grammar='json', parse_output_only=True, max_new_tokens=50)

prompt = "Please return a json object to represent country India with name, capital and population?"
output = syn_llm.infer(prompt)[0]
print(f"SynCode output:\n{output}")

# SynCode output:
# {
#     "name": "India",
#     "capital": "New Delhi",
#     "population": "1,366,417,754"
# }

👤 Custom Grammar Input

Syncode allows users to define grammar using a simple EBNF syntax adapted from Lark. One can also simply feed the grammar rules directly as a string of rules as shown below.

Please refer to the notebooks directory for examples using custom grammars and instructions for instructions to define your own custom grammar.

In our example, we want our model to only respond in the format month day. Without constrained decoding, the Language Model may not generate output that follows this syntax. Consider the code snippet below.

from syncode import Syncode

model_name = "microsoft/phi-2"

# Load the unconstrained original model
llm = Syncode(model=model_name, mode='original', max_new_tokens=20)

inp = "When is the christmas day?"

output = llm.infer(inp)
print(f"LLM output:\n{repr(output)}\n")
# LLM output:
# 'Christmas Day is on December 25th.\n<|im_end|>\n<|im'

As shown above, the LLM generates a correct response but not in the format we want. We can pass in a grammar and leverage the ability of SynCode to guide the LLM generation with this grammar. As shown in the snippet below, the SynCode augmented LLM generates output in the correct month day format.

from syncode import Syncode

# Pass in a grammar as a string of rules in EBNF format
grammar = """ start: month day 
              
              day: /[1-9]/ | /[1-2][0-9]/ | /3[0-1]/
              
              month: "January" | "February" | "March" | "April" | "May" | "June" | "July" | "August" | "September" | "October" | "November" | "December"
"""

model_name = "microsoft/phi-2"

# Load the Syncode augmented model
syn_llm = Syncode(model=model_name, grammar=grammar, parse_output_only=True)

inp = "When is the christmas day?"

output = syn_llm.infer(inp)
print(f"Syncode augmented LLM output:\n{output}")
# Syncode augmented LLM output:
# December 25

How Does SynCode Compare to Other Constrained Decoders?

Tool	Regex	CFG*	Pre-Computed*	GPL*
`LMQL`	✅	❌	❌	❌
`GUIDANCE`	✅	✅	❌	❌
`OUTLINES`	✅	✅	✅	❌
`PICARD`	✅	✅	❌	❌
`SYNCHROMESH`	✅	✅	❌	❌
`LLAMA.CPP`	✅	✅	❌	❌
`GCD`	✅	✅	❌	❌
SynCode	✅	✅	✅	✅

CFG*: Guide generation with a Context Free Grammar (CFG)

Pre-Computed*: Precompute masks over the vocabulary to significantly improve generation speed

GPL*: Support general-purpose programming languages, which involve non-context-free fragments, such as indentation in Python and end-of-scope markers in Golang.

📜 Citation

@misc{ugare2024syncode,
      title={SynCode: LLM Generation with Grammar Augmentation}, 
      author={Shubham Ugare and Tarun Suresh and Hangoo Kang and Sasa Misailovic and Gagandeep Singh},
      year={2024},
      eprint={2403.01632},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Contact

For questions, please contact Shubham Ugare.

For Tasks:

Click tags to check more tools for each tasks

generate code snippets validate syntax enhance language models customize grammars improve code generation

For Jobs:

software developer data scientist machine learning engineer ai researcher computational linguist

Alternative AI tools for syncode

Similar Open Source Tools

syncode

github

: 225

syncode

github

: 251

HippoRAG

HippoRAG is a novel retrieval augmented generation (RAG) framework inspired by the neurobiology of human long-term memory that enables Large Language Models (LLMs) to continuously integrate knowledge across external documents. It provides RAG systems with capabilities that usually require a costly and high-latency iterative LLM pipeline for only a fraction of the computational cost. The tool facilitates setting up retrieval corpus, indexing, and retrieval processes for LLMs, offering flexibility in choosing different online LLM APIs or offline LLM deployments through LangChain integration. Users can run retrieval on pre-defined queries or integrate directly with the HippoRAG API. The tool also supports reproducibility of experiments and provides data, baselines, and hyperparameter tuning scripts for research purposes.

github

: 2.1k

mistral-inference

Mistral Inference repository contains minimal code to run 7B, 8x7B, and 8x22B models. It provides model download links, installation instructions, and usage guidelines for running models via CLI or Python. The repository also includes information on guardrailing, model platforms, deployment, and references. Users can interact with models through commands like mistral-demo, mistral-chat, and mistral-common. Mistral AI models support function calling and chat interactions for tasks like testing models, chatting with models, and using Codestral as a coding assistant. The repository offers detailed documentation and links to blogs for further information.

github

: 10.1k

mlx-llm

mlx-llm is a library that allows you to run Large Language Models (LLMs) on Apple Silicon devices in real-time using Apple's MLX framework. It provides a simple and easy-to-use API for creating, loading, and using LLM models, as well as a variety of applications such as chatbots, fine-tuning, and retrieval-augmented generation.

github

: 384

magentic

Easily integrate Large Language Models into your Python code. Simply use the `@prompt` and `@chatprompt` decorators to create functions that return structured output from the LLM. Mix LLM queries and function calling with regular Python code to create complex logic.

github

: 2.2k

client-python

The Mistral Python Client is a tool inspired by cohere-python that allows users to interact with the Mistral AI API. It provides functionalities to access and utilize the AI capabilities offered by Mistral. Users can easily install the client using pip and manage dependencies using poetry. The client includes examples demonstrating how to use the API for various tasks, such as chat interactions. To get started, users need to obtain a Mistral API Key and set it as an environment variable. Overall, the Mistral Python Client simplifies the integration of Mistral AI services into Python applications.

github

: 570

llm-client

LLMClient is a JavaScript/TypeScript library that simplifies working with large language models (LLMs) by providing an easy-to-use interface for building and composing efficient prompts using prompt signatures. These signatures enable the automatic generation of typed prompts, allowing developers to leverage advanced capabilities like reasoning, function calling, RAG, ReAcT, and Chain of Thought. The library supports various LLMs and vector databases, making it a versatile tool for a wide range of applications.

github

: 540

LLMDebugger

This repository contains the code and dataset for LDB, a novel debugging framework that enables Large Language Models (LLMs) to refine their generated programs by tracking the values of intermediate variables throughout the runtime execution. LDB segments programs into basic blocks, allowing LLMs to concentrate on simpler code units, verify correctness block by block, and pinpoint errors efficiently. The tool provides APIs for debugging and generating code with debugging messages, mimicking how human developers debug programs.

github

: 302

langchainrb

Langchain.rb is a Ruby library that makes it easy to build LLM-powered applications. It provides a unified interface to a variety of LLMs, vector search databases, and other tools, making it easy to build and deploy RAG (Retrieval Augmented Generation) systems and assistants. Langchain.rb is open source and available under the MIT License.

github

: 1.7k

neural-speed

Neural Speed is an innovative library designed to support the efficient inference of large language models (LLMs) on Intel platforms through the state-of-the-art (SOTA) low-bit quantization powered by Intel Neural Compressor. The work is inspired by llama.cpp and further optimized for Intel platforms with our innovations in NeurIPS' 2023

github

: 327

litserve

LitServe is a high-throughput serving engine for deploying AI models at scale. It generates an API endpoint for a model, handles batching, streaming, autoscaling across CPU/GPUs, and more. Built for enterprise scale, it supports every framework like PyTorch, JAX, Tensorflow, and more. LitServe is designed to let users focus on model performance, not the serving boilerplate. It is like PyTorch Lightning for model serving but with broader framework support and scalability.

github

: 53

python-tgpt

Python-tgpt is a Python package that enables seamless interaction with over 45 free LLM providers without requiring an API key. It also provides image generation capabilities. The name _python-tgpt_ draws inspiration from its parent project tgpt, which operates on Golang. Through this Python adaptation, users can effortlessly engage with a number of free LLMs available, fostering a smoother AI interaction experience.

github

: 95

litdata

LitData is a tool designed for blazingly fast, distributed streaming of training data from any cloud storage. It allows users to transform and optimize data in cloud storage environments efficiently and intuitively, supporting various data types like images, text, video, audio, geo-spatial, and multimodal data. LitData integrates smoothly with frameworks such as LitGPT and PyTorch, enabling seamless streaming of data to multiple machines. Key features include multi-GPU/multi-node support, easy data mixing, pause & resume functionality, support for profiling, memory footprint reduction, cache size configuration, and on-prem optimizations. The tool also provides benchmarks for measuring streaming speed and conversion efficiency, along with runnable templates for different data types. LitData enables infinite cloud data processing by utilizing the Lightning.ai platform to scale data processing with optimized machines.

github

: 432

extractor

Extractor is an AI-powered data extraction library for Laravel that leverages OpenAI's capabilities to effortlessly extract structured data from various sources, including images, PDFs, and emails. It features a convenient wrapper around OpenAI Chat and Completion endpoints, supports multiple input formats, includes a flexible Field Extractor for arbitrary data extraction, and integrates with Textract for OCR functionality. Extractor utilizes JSON Mode from the latest GPT-3.5 and GPT-4 models, providing accurate and efficient data extraction.

github

: 86

instructor

Instructor is a Python library that makes it a breeze to work with structured outputs from large language models (LLMs). Built on top of Pydantic, it provides a simple, transparent, and user-friendly API to manage validation, retries, and streaming responses. Get ready to supercharge your LLM workflows!

github

: 7.7k

For similar tasks

syncode

github

: 225

MathPile

MathPile is a generative AI tool designed for math, offering a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens. It draws from various sources such as textbooks, arXiv, Wikipedia, ProofWiki, StackExchange, and web pages, catering to different educational levels and math competitions. The corpus is meticulously processed to ensure data quality, with extensive documentation and data contamination detection. MathPile aims to enhance mathematical reasoning abilities of language models.

github

: 354

code-review-gpt

Code Review GPT uses Large Language Models to review code in your CI/CD pipeline. It helps streamline the code review process by providing feedback on code that may have issues or areas for improvement. It should pick up on common issues such as exposed secrets, slow or inefficient code, and unreadable code. It can also be run locally in your command line to review staged files. Code Review GPT is in alpha and should be used for fun only. It may provide useful feedback but please check any suggestions thoroughly.

github

: 1.8k

shell_gpt

ShellGPT is a command-line productivity tool powered by AI large language models (LLMs). This command-line tool offers streamlined generation of shell commands, code snippets, documentation, eliminating the need for external resources (like Google search). Supports Linux, macOS, Windows and compatible with all major Shells like PowerShell, CMD, Bash, Zsh, etc.

github

: 9.0k

llm.nvim

llm.nvim is a plugin for Neovim that enables code completion using LLM models. It supports 'ghost-text' code completion similar to Copilot and allows users to choose their model for code generation via HTTP requests. The plugin interfaces with multiple backends like Hugging Face, Ollama, Open AI, and TGI, providing flexibility in model selection and configuration. Users can customize the behavior of suggestions, tokenization, and model parameters to enhance their coding experience. llm.nvim also includes commands for toggling auto-suggestions and manually requesting suggestions, making it a versatile tool for developers using Neovim.

github

: 812

DemoGPT

DemoGPT is an all-in-one agent library that provides tools, prompts, frameworks, and LLM models for streamlined agent development. It leverages GPT-3.5-turbo to generate LangChain code, creating interactive Streamlit applications. The tool is designed for creating intelligent, interactive, and inclusive solutions in LLM-based application development. It offers model flexibility, iterative development, and a commitment to user engagement. Future enhancements include integrating Gorilla for autonomous API usage and adding a publicly available database for refining the generation process.

github

: 1.7k

CodeGen

CodeGen is an official release of models for Program Synthesis by Salesforce AI Research. It includes CodeGen1 and CodeGen2 models with varying parameters. The latest version, CodeGen2.5, outperforms previous models. The tool is designed for code generation tasks using large language models trained on programming and natural languages. Users can access the models through the Hugging Face Hub and utilize them for program synthesis and infill sampling. The accompanying Jaxformer library provides support for data pre-processing, training, and fine-tuning of the CodeGen models.

github

: 5.0k

ice-score

ICE-Score is a tool designed to instruct large language models to evaluate code. It provides a minimum viable product (MVP) for evaluating generated code snippets using inputs such as problem, output, task, aspect, and model. Users can also evaluate with reference code and enable zero-shot chain-of-thought evaluation. The tool is built on codegen-metrics and code-bert-score repositories and includes datasets like CoNaLa and HumanEval. ICE-Score has been accepted to EACL 2024.

github

: 62

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675