
blendsql
Query language for blending SQL and LLMs across structured + unstructured data, with type constraints.
Stars: 109

BlendSQL is a superset of SQLite designed for problem decomposition and hybrid question-answering with Large Language Models (LLMs). It allows users to blend operations over heterogeneous data sources like tables, text, and images, combining the structured and interpretable reasoning of SQL with the generalizable reasoning of LLMs. Users can oversee all calls (LLM + SQL) within a unified query language, enabling tasks such as building LLM chatbots for travel planning and answering complex questions by injecting 'ingredients' as callable functions.
README:

SQL 🤝 LLMs
Check out our online documentation for a more comprehensive overview.
Join our Discord server for more discussion!
pip install blendsql
import pandas as pd
from blendsql import BlendSQL
from blendsql.models import TransformersLLM, LiteLLM
USE_LOCAL_CONSTRAINED_MODEL = False
# Load model, either a local transformers model, or remote provider via LiteLLM
if USE_LOCAL_CONSTRAINED_MODEL:
model = TransformersLLM(
"meta-llama/Llama-3.2-3B-Instruct", config={"device_map": "auto"}
) # Local models enable BlendSQL's predicate-guided constrained decoding
else:
model = LiteLLM("openai/gpt-4o-mini")
# Prepare our BlendSQL connection
bsql = BlendSQL(
{
"People": pd.DataFrame(
{
"Name": [
"George Washington",
"John Adams",
"Thomas Jefferson",
"James Madison",
"James Monroe",
"Alexander Hamilton",
"Sabrina Carpenter",
"Charli XCX",
"Elon Musk",
"Michelle Obama",
"Elvis Presley",
],
"Known_For": [
"Established federal government, First U.S. President",
"XYZ Affair, Alien and Sedition Acts",
"Louisiana Purchase, Declaration of Independence",
"War of 1812, Constitution",
"Monroe Doctrine, Missouri Compromise",
"Created national bank, Federalist Papers",
"Nonsense, Emails I Cant Send, Mean Girls musical",
"Crash, How Im Feeling Now, Boom Clap",
"Tesla, SpaceX, Twitter/X acquisition",
"Lets Move campaign, Becoming memoir",
"14 Grammys, King of Rock n Roll",
],
}
),
"Eras": pd.DataFrame({"Years": ["1700-1800", "1800-1900", "1900-2000", "2000-Now"]}),
},
model=model,
verbose=True,
)
smoothie = bsql.execute(
"""
SELECT * FROM People P
WHERE P.Name IN {{
LLMQA('First 3 presidents of the U.S?', quantifier='{3}')
}}
""",
infer_gen_constraints=True, # Is `True` by default
)
print(smoothie.df)
# ┌───────────────────┬───────────────────────────────────────────────────────┐
# │ Name │ Known_For │
# ├───────────────────┼───────────────────────────────────────────────────────┤
# │ George Washington │ Established federal government, First U.S. Preside... │
# │ John Adams │ XYZ Affair, Alien and Sedition Acts │
# │ Thomas Jefferson │ Louisiana Purchase, Declaration of Independence │
# └───────────────────┴───────────────────────────────────────────────────────┘
print(smoothie.summary())
# ┌────────────┬──────────────────────┬─────────────────┬─────────────────────┐
# │ Time (s) │ # Generation Calls │ Prompt Tokens │ Completion Tokens │
# ├────────────┼──────────────────────┼─────────────────┼─────────────────────┤
# │ 1.25158 │ 1 │ 296 │ 16 │
# └────────────┴──────────────────────┴─────────────────┴─────────────────────┘
smoothie = bsql.execute(
"""
SELECT GROUP_CONCAT(Name, ', ') AS 'Names',
{{
LLMMap(
'In which time period was this person born?',
p.Name,
options=Eras.Years
)
}} AS Born
FROM People p
GROUP BY Born
""",
)
print(smoothie.df)
# ┌───────────────────────────────────────────────────────┬───────────┐
# │ Names │ Born │
# ├───────────────────────────────────────────────────────┼───────────┤
# │ George Washington, John Adams, Thomas Jefferson, J... │ 1700-1800 │
# │ Sabrina Carpenter, Charli XCX, Elon Musk, Michelle... │ 2000-Now │
# │ Elvis Presley │ 1900-2000 │
# └───────────────────────────────────────────────────────┴───────────┘
print(smoothie.summary())
# ┌────────────┬──────────────────────┬─────────────────┬─────────────────────┐
# │ Time (s) │ # Generation Calls │ Prompt Tokens │ Completion Tokens │
# ├────────────┼──────────────────────┼─────────────────┼─────────────────────┤
# │ 1.03858 │ 2 │ 544 │ 75 │
# └────────────┴──────────────────────┴─────────────────┴─────────────────────┘
smoothie = bsql.execute("""
SELECT {{
LLMQA(
'Describe BlendSQL in 50 words.',
context=(
SELECT content[0:5000] AS "README"
FROM read_text('https://raw.githubusercontent.com/parkervg/blendsql/main/README.md')
)
)
}} AS answer
""")
print(smoothie.df)
# ┌─────────────────────────────────────────────────────┐
# │ answer │
# ├─────────────────────────────────────────────────────┤
# │ BlendSQL is a Python library that combines SQL a... │
# └─────────────────────────────────────────────────────┘
print(smoothie.summary())
# ┌────────────┬──────────────────────┬─────────────────┬─────────────────────┐
# │ Time (s) │ # Generation Calls │ Prompt Tokens │ Completion Tokens │
# ├────────────┼──────────────────────┼─────────────────┼─────────────────────┤
# │ 4.07617 │ 1 │ 1921 │ 50 │
# └────────────┴──────────────────────┴─────────────────┴─────────────────────┘
- (5/30/25) Created a Discord server
- (5/6/25): New blog post: Language Models, SQL, and Types, Oh My!
- (5/1/15): Single-page function documentation
- (3/16/25) Use BlendSQL with 100+ LLM APIs, using LiteLLM!
- (10/26/24) New tutorial! blendsql-by-example.ipynb
- (10/18/24) Concurrent async requests in 0.0.29! OpenAI and Anthropic
LLMMap
calls are speedy now.- Customize max concurrent async calls via
blendsql.config.set_async_limit(10)
- Customize max concurrent async calls via
BlendSQL is a superset of SQL for problem decomposition and hybrid question-answering with LLMs.
As a result, we can Blend together...
- 🥤 ...operations over heterogeneous data sources (e.g. tables, text, images)
- 🥤 ...the structured & interpretable reasoning of SQL with the generalizable reasoning of LLMs
Now, the user is given the control to oversee all calls (LLM + SQL) within a unified query language.
- Supports many DBMS 💾
- SQLite, PostgreSQL, DuckDB, Pandas (aka duckdb in a trenchcoat)
- Supports local & remote models ✨
- Transformers, OpenAI, Anthropic, Ollama, and 100+ more!
- Easily extendable to multi-modal usecases 🖼
- Write your normal queries - smart parsing optimizes what is passed to external functions 🧠
- Traverses abstract syntax tree with sqlglot to minimize LLM function calls 🌳
- Constrained decoding with guidance 🚀
- When using local models, we only generate syntactically valid outputs according to query syntax + database contents
- LLM function caching, built on diskcache 🔑
For example, imagine we have the following table titled parks
, containing info on national parks in the United States.
We can use BlendSQL to build a travel planning LLM chatbot to help us navigate the options below.
BlendSQL allows us to ask the following questions by injecting "ingredients", which are callable functions denoted by double curly brackets ({{
, }}
).
Which parks don't have park facilities?
SELECT "Name", "Description" FROM parks p
WHERE {{
LLMMap(
'Does this location have park facilities?',
p.Description
)
}} = FALSE
Name | Description |
---|---|
Everglades | The country's northernmost park protects an expanse of pure wilderness in Alaska's Brooks Range and has no park facilities. |
What does the largest park in Alaska look like?
SELECT "Name",
{{ImageCaption(Image)}} as "Image Description",
{{
LLMMap(
question='Size in km2?',
values=Area
)
}} as "Size in km" FROM parks p
WHERE "Location" = 'Alaska'
ORDER BY "Size in km" DESC LIMIT 1
Name | Image Description | Size in km |
---|---|---|
Everglades | A forest of tall trees with a sunset in the background. | 30448.1 |
Which state is the park in that protects an ash flow?
SELECT Location, Name AS "Park Protecting Ash Flow" FROM parks
WHERE Name = {{
LLMQA(
'Which park protects an ash flow?',
context=(SELECT Name, Description FROM parks),
options=Name
)
}}
Location | Park Protecting Ash Flow |
---|---|
Alaska | Katmai |
How many parks are located in more than 1 state?
SELECT COUNT(*) FROM parks
WHERE {{LLMMap('How many states?', Location)}} > 1
Count |
---|
1 |
Give me some info about the park in the state that Sarah Palin was governor of.
SELECT "Name", "Location", "Description" FROM parks
/* Will gather relevant context via WebSearch, and automatically constrain output
to one of the values in 'parks.Location'*/
WHERE Location = {{WebSearchQA('Which state was Sarah Palin governor of?')}}
Name | Location | Description |
---|---|---|
Everglades | Alaska | The country's northernmost park protects an expanse of pure wilderness in Alaska's Brooks Range and has no park facilities. |
Katmai | Alaska | This park on the Alaska Peninsula protects the Valley of Ten Thousand Smokes, an ash flow formed by the 1912 eruption of Novarupta. |
What's the difference in visitors for those parks with a superlative in their description vs. those without?
SELECT SUM(CAST(REPLACE("Recreation Visitors (2022)", ',', '') AS integer)) AS "Total Visitors",
{{LLMMap('Contains a superlative?', Description, options=('t', 'f'))}} AS "Description Contains Superlative",
GROUP_CONCAT(Name, ', ') AS "Park Names"
FROM parks
GROUP BY "Description Contains Superlative"
Total Visitors | Description Contains Superlative | Park Names |
---|---|---|
43365 | 0 | Everglades, Katmai |
2722385 | 1 | Death Valley, New River Gorge |
Now, we have an intermediate representation for our LLM to use that is explainable, debuggable, and very effective at hybrid question-answering tasks.
For in-depth descriptions of the above queries, check out our documentation.
@article{glenn2024blendsql,
title={BlendSQL: A Scalable Dialect for Unifying Hybrid Question Answering in Relational Algebra},
author={Parker Glenn and Parag Pravin Dakle and Liang Wang and Preethi Raghavan},
year={2024},
eprint={2402.17882},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
For the LLM-based ingredients in BlendSQL, few-shot prompting can be vital. In LLMMap
, LLMQA
and LLMJoin
, we provide an interface to pass custom few-shot examples and dynamically retrieve those top-k
most relevant examples at runtime, given the current inference example.
from blendsql import BlendSQL
from blendsql.ingredients.builtin import LLMMap, DEFAULT_MAP_FEW_SHOT
ingredients = {
LLMMap.from_args(
few_shot_examples=[
*DEFAULT_MAP_FEW_SHOT,
{
"question": "Is this a sport?",
"mapping": {
"Soccer": True,
"Chair": False,
"Banana": False,
"Golf": True
},
# Below are optional
"column_name": "Items",
"table_name": "Table",
"return_type": "boolean"
}
],
num_few_shot_examples=2,
# How many inference values to pass to model at once
batch_size=5,
)
}
bsql = BlendSQL(db, ingredients=ingredients)
from blendsql import BlendSQL
from blendsql.ingredients.builtin import LLMQA, DEFAULT_QA_FEW_SHOT
ingredients = {
LLMQA.from_args(
few_shot_examples=[
*DEFAULT_QA_FEW_SHOT,
{
"question": "Which weighs the most?",
"context": {
{
"Animal": ["Dog", "Gorilla", "Hamster"],
"Weight": ["20 pounds", "350 lbs", "100 grams"]
}
},
"answer": "Gorilla",
# Below are optional
"options": ["Dog", "Gorilla", "Hamster"]
}
],
# Will fetch `k` most relevant few-shot examples using embedding-based retriever
num_few_shot_examples=2,
# Lambda to turn the pd.DataFrame to a serialized string
context_formatter=lambda df: df.to_markdown(
index=False
)
)
}
bsql = BlendSQL(db, ingredients=ingredients)
from blendsql import BlendSQL
from blendsql.ingredients.builtin import LLMJoin, DEFAULT_JOIN_FEW_SHOT
ingredients = {
LLMJoin.from_args(
few_shot_examples=[
*DEFAULT_JOIN_FEW_SHOT,
{
"join_criteria": "Join the state to its capital.",
"left_values": ["California", "Massachusetts", "North Carolina"],
"right_values": ["Sacramento", "Boston", "Chicago"],
"mapping": {
"California": "Sacramento",
"Massachusetts": "Boston",
"North Carolina": "-"
}
}
],
num_few_shot_examples=2
)
}
bsql = BlendSQL(db, ingredients=ingredients)
Special thanks to those below for inspiring this project. Definitely recommend checking out the linked work below, and citing when applicable!
- The authors of Binding Language Models in Symbolic Languages
- This paper was the primary inspiration for BlendSQL.
- The authors of EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images
- As far as I can tell, the first publication to propose unifying model calls within SQL
- Served as the inspiration for the vqa-ingredient.ipynb example
- The authors of Grammar Prompting for Domain-Specific Language Generation with Large Language Models
- The maintainers of the Guidance library for powering the constrained decoding capabilities of BlendSQL
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for blendsql
Similar Open Source Tools

blendsql
BlendSQL is a superset of SQLite designed for problem decomposition and hybrid question-answering with Large Language Models (LLMs). It allows users to blend operations over heterogeneous data sources like tables, text, and images, combining the structured and interpretable reasoning of SQL with the generalizable reasoning of LLMs. Users can oversee all calls (LLM + SQL) within a unified query language, enabling tasks such as building LLM chatbots for travel planning and answering complex questions by injecting 'ingredients' as callable functions.

agentic_security
Agentic Security is an open-source vulnerability scanner designed for safety scanning, offering customizable rule sets and agent-based attacks. It provides comprehensive fuzzing for any LLMs, LLM API integration, and stress testing with a wide range of fuzzing and attack techniques. The tool is not a foolproof solution but aims to enhance security measures against potential threats. It offers installation via pip and supports quick start commands for easy setup. Users can utilize the tool for LLM integration, adding custom datasets, running CI checks, extending dataset collections, and dynamic datasets with mutations. The tool also includes a probe endpoint for integration testing. The roadmap includes expanding dataset variety, introducing new attack vectors, developing an attacker LLM, and integrating OWASP Top 10 classification.

osaurus
Osaurus is a native, Apple Silicon-only local LLM server built on Apple's MLX for maximum performance on M‑series chips. It is a SwiftUI app + SwiftNIO server with OpenAI‑compatible and Ollama‑compatible endpoints. The tool supports native MLX text generation, model management, streaming and non‑streaming chat completions, OpenAI‑compatible function calling, real-time system resource monitoring, and path normalization for API compatibility. Osaurus is designed for macOS 15.5+ and Apple Silicon (M1 or newer) with Xcode 16.4+ required for building from source.

vlmrun-hub
VLMRun Hub is a versatile tool for managing and running virtual machines in a centralized manner. It provides a user-friendly interface to easily create, start, stop, and monitor virtual machines across multiple hosts. With VLMRun Hub, users can efficiently manage their virtualized environments and streamline their workflow. The tool offers flexibility and scalability, making it suitable for both small-scale personal projects and large-scale enterprise deployments.

openapi
The `@samchon/openapi` repository is a collection of OpenAPI types and converters for various versions of OpenAPI specifications. It includes an 'emended' OpenAPI v3.1 specification that enhances clarity by removing ambiguous and duplicated expressions. The repository also provides an application composer for LLM (Large Language Model) function calling from OpenAPI documents, allowing users to easily perform LLM function calls based on the Swagger document. Conversions to different versions of OpenAPI documents are also supported, all based on the emended OpenAPI v3.1 specification. Users can validate their OpenAPI documents using the `typia` library with `@samchon/openapi` types, ensuring compliance with standard specifications.

go-anthropic
Go-anthropic is an unofficial API wrapper for Anthropic Claude in Go. It supports completions, streaming completions, messages, streaming messages, vision, and tool use. Users can interact with the Anthropic Claude API to generate text completions, analyze messages, process images, and utilize specific tools for various tasks.

openai-scala-client
This is a no-nonsense async Scala client for OpenAI API supporting all the available endpoints and params including streaming, chat completion, vision, and voice routines. It provides a single service called OpenAIService that supports various calls such as Models, Completions, Chat Completions, Edits, Images, Embeddings, Batches, Audio, Files, Fine-tunes, Moderations, Assistants, Threads, Thread Messages, Runs, Run Steps, Vector Stores, Vector Store Files, and Vector Store File Batches. The library aims to be self-contained with minimal dependencies and supports API-compatible providers like Azure OpenAI, Azure AI, Anthropic, Google Vertex AI, Groq, Grok, Fireworks AI, OctoAI, TogetherAI, Cerebras, Mistral, Deepseek, Ollama, FastChat, and more.

cua
Cua is a tool for creating and running high-performance macOS and Linux virtual machines on Apple Silicon, with built-in support for AI agents. It provides libraries like Lume for running VMs with near-native performance, Computer for interacting with sandboxes, and Agent for running agentic workflows. Users can refer to the documentation for onboarding, explore demos showcasing AI-Gradio and GitHub issue fixing, and utilize accessory libraries like Core, PyLume, Computer Server, and SOM. Contributions are welcome, and the tool is open-sourced under the MIT License.

gp.nvim
Gp.nvim (GPT prompt) Neovim AI plugin provides a seamless integration of GPT models into Neovim, offering features like streaming responses, extensibility via hook functions, minimal dependencies, ChatGPT-like sessions, instructable text/code operations, speech-to-text support, and image generation directly within Neovim. The plugin aims to enhance the Neovim experience by leveraging the power of AI models in a user-friendly and native way.

aigcpanel
AigcPanel is a simple and easy-to-use all-in-one AI digital human system that even beginners can use. It supports video synthesis, voice synthesis, voice cloning, simplifies local model management, and allows one-click import and use of AI models. It prohibits the use of this product for illegal activities and users must comply with the laws and regulations of the People's Republic of China.

llm-sandbox
LLM Sandbox is a lightweight and portable sandbox environment designed to securely execute large language model (LLM) generated code in a safe and isolated manner using Docker containers. It provides an easy-to-use interface for setting up, managing, and executing code in a controlled Docker environment, simplifying the process of running code generated by LLMs. The tool supports multiple programming languages, offers flexibility with predefined Docker images or custom Dockerfiles, and allows scalability with support for Kubernetes and remote Docker hosts.

instructor
Instructor is a tool that provides structured outputs from Large Language Models (LLMs) in a reliable manner. It simplifies the process of extracting structured data by utilizing Pydantic for validation, type safety, and IDE support. With Instructor, users can define models and easily obtain structured data without the need for complex JSON parsing, error handling, or retries. The tool supports automatic retries, streaming support, and extraction of nested objects, making it production-ready for various AI applications. Trusted by a large community of developers and companies, Instructor is used by teams at OpenAI, Google, Microsoft, AWS, and YC startups.

RagaAI-Catalyst
RagaAI Catalyst is a comprehensive platform designed to enhance the management and optimization of LLM projects. It offers features such as project management, dataset management, evaluation management, trace management, prompt management, synthetic data generation, and guardrail management. These functionalities enable efficient evaluation and safeguarding of LLM applications.

lagent
Lagent is a lightweight open-source framework that allows users to efficiently build large language model(LLM)-based agents. It also provides some typical tools to augment LLM. The overview of our framework is shown below:

UHGEval
UHGEval is a comprehensive framework designed for evaluating the hallucination phenomena. It includes UHGEval, a framework for evaluating hallucination, XinhuaHallucinations dataset, and UHGEval-dataset pipeline for creating XinhuaHallucinations. The framework offers flexibility and extensibility for evaluating common hallucination tasks, supporting various models and datasets. Researchers can use the open-source pipeline to create customized datasets. Supported tasks include QA, dialogue, summarization, and multi-choice tasks.

amadeus-java
Amadeus Java SDK provides a rich set of APIs for the travel industry, allowing developers to access various functionalities such as flight search, booking, airport information, and more. The SDK simplifies interaction with the Amadeus API by providing self-contained code examples and detailed documentation. Developers can easily make API calls, handle responses, and utilize features like pagination and logging. The SDK supports various endpoints for tasks like flight search, booking management, airport information retrieval, and travel analytics. It also offers functionalities for hotel search, booking, and sentiment analysis. Overall, the Amadeus Java SDK is a comprehensive tool for integrating Amadeus APIs into Java applications.
For similar tasks

blendsql
BlendSQL is a superset of SQLite designed for problem decomposition and hybrid question-answering with Large Language Models (LLMs). It allows users to blend operations over heterogeneous data sources like tables, text, and images, combining the structured and interpretable reasoning of SQL with the generalizable reasoning of LLMs. Users can oversee all calls (LLM + SQL) within a unified query language, enabling tasks such as building LLM chatbots for travel planning and answering complex questions by injecting 'ingredients' as callable functions.

agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

zep-python
Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.

lollms
LoLLMs Server is a text generation server based on large language models. It provides a Flask-based API for generating text using various pre-trained language models. This server is designed to be easy to install and use, allowing developers to integrate powerful text generation capabilities into their applications.

LlamaIndexTS
LlamaIndex.TS is a data framework for your LLM application. Use your own data with large language models (LLMs, OpenAI ChatGPT and others) in Typescript and Javascript.

semantic-kernel
Semantic Kernel is an SDK that integrates Large Language Models (LLMs) like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C#, Python, and Java. Semantic Kernel achieves this by allowing you to define plugins that can be chained together in just a few lines of code. What makes Semantic Kernel _special_ , however, is its ability to _automatically_ orchestrate plugins with AI. With Semantic Kernel planners, you can ask an LLM to generate a plan that achieves a user's unique goal. Afterwards, Semantic Kernel will execute the plan for the user.

botpress
Botpress is a platform for building next-generation chatbots and assistants powered by OpenAI. It provides a range of tools and integrations to help developers quickly and easily create and deploy chatbots for various use cases.

BotSharp
BotSharp is an open-source machine learning framework for building AI bot platforms. It provides a comprehensive set of tools and components for developing and deploying intelligent virtual assistants. BotSharp is designed to be modular and extensible, allowing developers to easily integrate it with their existing systems and applications. With BotSharp, you can quickly and easily create AI-powered chatbots, virtual assistants, and other conversational AI applications.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.