gambit
Agent harness framework for building, running, and verifying LLM workflows
Stars: 196
Gambit is an open-source developer-first framework for building reliable LLM workflows. It helps compose small, typed 'decks' with clear inputs/outputs and guardrails. Users can run decks locally, stream traces, and debug with a built-in UI. The framework aims to improve orchestration by treating each step as a small deck, mixing LLM and compute tasks effortlessly, feeding models only necessary information, and providing built-in observability for debugging.
README:
Gambit is an open-source, developer-first framework that helps you build
reliable LLM workflows by composing small, typed “decks”
with clear inputs/outputs and guardrails. Run decks locally, stream traces, and
debug with a built-in UI.
Requirements: Node.js 18+ and OPENROUTER_API_KEY (set OPENROUTER_BASE_URL if
you proxy OpenRouter-style APIs).
Run the CLI directly with npx (no install):
export OPENROUTER_API_KEY=...
npx @bolt-foundry/gambit demo
Downloads example files (hello decks plus the examples/ gallery) and sets
environment variables.
To start onboarding with the simulator, run:
npx @bolt-foundry/gambit serve gambit/hello.deck.md
open http://localhost:8000/debug
Use the Build tab to draft your own workspace decks and scenarios.
Run an example in the terminal (repl):
npx @bolt-foundry/gambit repl gambit/hello.deck.md
This example just says "hello" and repeats your message back to you.
Run an example in the browser (serve):
npx @bolt-foundry/gambit serve gambit/hello.deck.md
open http://localhost:8000/debug
- Most teams wire one long prompt to several tools and hope the model routes
correctly. - Context often arrives as a single giant fetch or RAG blob, so costs climb and
hallucinations slip in. - Input/outputs are rarely typed, which makes orchestration brittle and hard to
test offline. - Debugging leans on provider logs instead of local traces, so reproducing
failures is slow.
- Treat each step as a small deck with explicit inputs/outputs and guardrails;
model calls are just one kind of action. - Mix LLM and compute tasks interchangeably and effortlessly inside the same
deck tree. - Feed models only what they need per step; inject references and cards instead
of dumping every document. - Keep orchestration logic local and testable; run decks offline with
predictable traces. - Ship with built-in observability (streaming, REPL, debug UI) so debugging
feels like regular software, not guesswork.
Use the CLI to run decks locally, stream output, and capture traces/state.
Run with npx (no install):
npx @bolt-foundry/gambit <command>
Run a deck once:
npx @bolt-foundry/gambit run <deck> --context <json|string> --message <json|string>
--contextreplaces the old--initflag. The CLI still accepts--initas a deprecated alias for now so existing scripts keep working.
Drop into a REPL (streams by default):
npx @bolt-foundry/gambit repl <deck>
Run a persona against a root deck (scenario):
npx @bolt-foundry/gambit scenario <root-deck> --test-deck <persona-deck>
Grade a saved session:
npx @bolt-foundry/gambit grade <grader-deck> --state <file>
Start the Debug UI server:
npx @bolt-foundry/gambit serve <deck> --port 8000
Tracing and state:
--trace <file> for JSONL traces
--verbose to print events
--state <file> to persist a session.
- Deck-executing CLI surfaces default to worker sandbox execution.
- Use
--no-worker-sandbox(or--legacy-exec) to force legacy in-process execution. -
--worker-sandboxexplicitly forces worker execution on. -
--sandbox/--no-sandboxare deprecated aliases. -
gambit.tomlequivalent:[execution] worker_sandbox = false # same as --no-worker-sandbox # legacy_exec = true # equivalent rollback toggle
The npm launcher (npx @bolt-foundry/gambit ...) runs the Gambit CLI binary for
your platform, so these defaults and flags apply there as well.
The simulator is the local Debug UI that streams runs and renders traces.
Run with npx (no install):
npx @bolt-foundry/gambit <command>
Start it:
npx @bolt-foundry/gambit serve <deck> --port 8000
Then open:
http://localhost:8000/
It also serves:
http://localhost:8000/test
http://localhost:8000/grade
The Debug UI shows transcript lanes plus a trace/tools feed. If the deck has an
contextSchema, the UI renders a schema-driven form with defaults and a raw
JSON
tab. Local-first state is stored under .gambit/ (sessions, traces, notes).
Use the library when you want TypeScript decks/cards or custom compute steps.
Import the helpers from JSR:
import { defineDeck, defineCard } from "jsr:@bolt-foundry/gambit";
Define contextSchema/responseSchema with Zod to validate IO, and implement
run/execute for compute decks. To call a child deck from code, use
ctx.spawnAndWait({ path, input }). Emit structured trace events with
ctx.log(...).
runDeck from @bolt-foundry/gambit now uses CLI-equivalent provider/model
defaults (alias expansion, provider routing, fallback behavior).
Before (direct-provider setup in each caller):
import { createOpenRouterProvider, runDeck } from "jsr:@bolt-foundry/gambit";
const provider = createOpenRouterProvider({
apiKey: Deno.env.get("OPENROUTER_API_KEY")!,
});
await runDeck({
path: "./root.deck.md",
input: { message: "hi" },
modelProvider: provider,
});After (defaulted wrapper):
import { runDeck } from "jsr:@bolt-foundry/gambit";
await runDeck({
path: "./root.deck.md",
input: { message: "hi" },
});Per-runtime override (shared runtime object):
import { createDefaultedRuntime, runDeck } from "jsr:@bolt-foundry/gambit";
const runtime = await createDefaultedRuntime({
fallbackProvider: "codex-cli",
});
await runDeck({
runtime,
path: "./root.deck.md",
input: { message: "hi" },
});Replacement mapping:
- Legacy direct core passthrough export:
runDeck->runDeckCore - Defaulted wrapper export:
runDeck - Runtime builder:
createDefaultedRuntime
+++
label = "hello_world"
[modelParams]
model = "openai/gpt-4o-mini"
temperature = 0
+++
You are a concise assistant. Greet the user and echo the input.
Run it:
npx @bolt-foundry/gambit run ./hello_world.deck.md --context '"Gambit"' --stream
// echo.deck.ts
import { defineDeck } from "jsr:@bolt-foundry/gambit";
import { z } from "zod";
export default defineDeck({
label: "echo",
contextSchema: z.object({ text: z.string() }),
responseSchema: z.object({ text: z.string(), length: z.number() }),
run(ctx) {
return { text: ctx.input.text, length: ctx.input.text.length };
},
});Run it:
npx @bolt-foundry/gambit run ./echo.deck.ts --context '{"text":"ping"}'
+++
label = "agent_with_time"
modelParams = { model = "openai/gpt-4o-mini", temperature = 0 }
[[actions]]
name = "get_time"
path = "./get_time.deck.ts"
description = "Return the current ISO timestamp."
+++
A tiny agent that calls get_time, then replies with the timestamp and the input.
And the child action: get_time.deck.ts
// get_time.deck.ts
import { defineDeck } from "jsr:@bolt-foundry/gambit";
import { z } from "zod";
export default defineDeck({
label: "get_time",
contextSchema: z.object({}), // no args
responseSchema: z.object({ iso: z.string() }),
run() {
return { iso: new Date().toISOString() };
},
});Run it:
npx @bolt-foundry/gambit run ./agent_with_time.deck.md --context '"hello"' --stream
Need a turnkey scenario that hits personas → init → non-root gambit_respond
payloads → graders? Use the example in packages/gambit/examples/respond_flow/.
cd packages/gambit
npx @bolt-foundry/gambit serve ./examples/respond_flow/decks/root.deck.ts --port 8000
Then:
- Open
http://localhost:8000/test, pick the Escalation persona, and run it. Leave the “Use scenario deck input for init” toggle on to see persona data seed the init form automatically. - Switch to the Debug tab to inspect the session—the child deck emits a
gambit_respondpayload that now shows up as a structured assistant turn. - Head to the Calibrate tab and run the Respond payload grader to exercise grading on the non-root respond output.
If you prefer Deno, use the Deno commands below.
Quickstart:
export OPENROUTER_API_KEY=...
deno run -A jsr:@bolt-foundry/gambit/cli demo
Run a deck:
deno run -A jsr:@bolt-foundry/gambit/cli run <deck> --context <json|string> --message <json|string>
Start the Debug UI:
deno run -A jsr:@bolt-foundry/gambit/cli serve <deck> --port 8000
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for gambit
Similar Open Source Tools
gambit
Gambit is an open-source developer-first framework for building reliable LLM workflows. It helps compose small, typed 'decks' with clear inputs/outputs and guardrails. Users can run decks locally, stream traces, and debug with a built-in UI. The framework aims to improve orchestration by treating each step as a small deck, mixing LLM and compute tasks effortlessly, feeding models only necessary information, and providing built-in observability for debugging.
python-tgpt
Python-tgpt is a Python package that enables seamless interaction with over 45 free LLM providers without requiring an API key. It also provides image generation capabilities. The name _python-tgpt_ draws inspiration from its parent project tgpt, which operates on Golang. Through this Python adaptation, users can effortlessly engage with a number of free LLMs available, fostering a smoother AI interaction experience.
instructor
Instructor is a popular Python library for managing structured outputs from large language models (LLMs). It offers a user-friendly API for validation, retries, and streaming responses. With support for various LLM providers and multiple languages, Instructor simplifies working with LLM outputs. The library includes features like response models, retry management, validation, streaming support, and flexible backends. It also provides hooks for logging and monitoring LLM interactions, and supports integration with Anthropic, Cohere, Gemini, Litellm, and Google AI models. Instructor facilitates tasks such as extracting user data from natural language, creating fine-tuned models, managing uploaded files, and monitoring usage of OpenAI models.
mediasoup-client-aiortc
mediasoup-client-aiortc is a handler for the aiortc Python library, allowing Node.js applications to connect to a mediasoup server using WebRTC for real-time audio, video, and DataChannel communication. It facilitates the creation of Worker instances to manage Python subprocesses, obtain audio/video tracks, and create mediasoup-client handlers. The tool supports features like getUserMedia, handlerFactory creation, and event handling for subprocess closure and unexpected termination. It provides custom classes for media stream and track constraints, enabling diverse audio/video sources like devices, files, or URLs. The tool enhances WebRTC capabilities in Node.js applications through seamless Python subprocess communication.
model.nvim
model.nvim is a tool designed for Neovim users who want to utilize AI models for completions or chat within their text editor. It allows users to build prompts programmatically with Lua, customize prompts, experiment with multiple providers, and use both hosted and local models. The tool supports features like provider agnosticism, programmatic prompts in Lua, async and multistep prompts, streaming completions, and chat functionality in 'mchat' filetype buffer. Users can customize prompts, manage responses, and context, and utilize various providers like OpenAI ChatGPT, Google PaLM, llama.cpp, ollama, and more. The tool also supports treesitter highlights and folds for chat buffers.
aiavatarkit
AIAvatarKit is a tool for building AI-based conversational avatars quickly. It supports various platforms like VRChat and cluster, along with real-world devices. The tool is extensible, allowing unlimited capabilities based on user needs. It requires VOICEVOX API, Google or Azure Speech Services API keys, and Python 3.10. Users can start conversations out of the box and enjoy seamless interactions with the avatars.
ChatDBG
ChatDBG is an AI-based debugging assistant for C/C++/Python/Rust code that integrates large language models into a standard debugger (`pdb`, `lldb`, `gdb`, and `windbg`) to help debug your code. With ChatDBG, you can engage in a dialog with your debugger, asking open-ended questions about your program, like `why is x null?`. ChatDBG will _take the wheel_ and steer the debugger to answer your queries. ChatDBG can provide error diagnoses and suggest fixes. As far as we are aware, ChatDBG is the _first_ debugger to automatically perform root cause analysis and to provide suggested fixes.
llm-vscode
llm-vscode is an extension designed for all things LLM, utilizing llm-ls as its backend. It offers features such as code completion with 'ghost-text' suggestions, the ability to choose models for code generation via HTTP requests, ensuring prompt size fits within the context window, and code attribution checks. Users can configure the backend, suggestion behavior, keybindings, llm-ls settings, and tokenization options. Additionally, the extension supports testing models like Code Llama 13B, Phind/Phind-CodeLlama-34B-v2, and WizardLM/WizardCoder-Python-34B-V1.0. Development involves cloning llm-ls, building it, and setting up the llm-vscode extension for use.
js-genai
The Google Gen AI JavaScript SDK is an experimental SDK for TypeScript and JavaScript developers to build applications powered by Gemini. It supports both the Gemini Developer API and Vertex AI. The SDK is designed to work with Gemini 2.0 features. Users can access API features through the GoogleGenAI classes, which provide submodules for querying models, managing caches, creating chats, uploading files, and starting live sessions. The SDK also allows for function calling to interact with external systems. Users can find more samples in the GitHub samples directory.
shell-pilot
Shell-pilot is a simple, lightweight shell script designed to interact with various AI models such as OpenAI, Ollama, Mistral AI, LocalAI, ZhipuAI, Anthropic, Moonshot, and Novita AI from the terminal. It enhances intelligent system management without any dependencies, offering features like setting up a local LLM repository, using official models and APIs, viewing history and session persistence, passing input prompts with pipe/redirector, listing available models, setting request parameters, generating and running commands in the terminal, easy configuration setup, system package version checking, and managing system aliases.
matchlock
Matchlock is a CLI tool designed for running AI agents in isolated and disposable microVMs with network allowlisting and secret injection capabilities. It ensures that your secrets never enter the VM, providing a secure environment for AI agents to execute code without risking access to your machine. The tool offers features such as sealing the network to only allow traffic to specified hosts, injecting real credentials in-flight by the host, and providing a full Linux environment for the agent's operations while maintaining isolation from the host machine. Matchlock supports quick booting of Linux environments, sandbox lifecycle management, image building, and SDKs for Go and Python for embedding sandboxes in applications.
swarmzero
SwarmZero SDK is a library that simplifies the creation and execution of AI Agents and Swarms of Agents. It supports various LLM Providers such as OpenAI, Azure OpenAI, Anthropic, MistralAI, Gemini, Nebius, and Ollama. Users can easily install the library using pip or poetry, set up the environment and configuration, create and run Agents, collaborate with Swarms, add tools for complex tasks, and utilize retriever tools for semantic information retrieval. Sample prompts are provided to help users explore the capabilities of the agents and swarms. The SDK also includes detailed examples and documentation for reference.
shortest
Shortest is an AI-powered natural language end-to-end testing framework built on Playwright. It provides a seamless testing experience by allowing users to write tests in natural language and execute them using Anthropic Claude API. The framework also offers GitHub integration with 2FA support, making it suitable for testing web applications with complex authentication flows. Shortest simplifies the testing process by enabling users to run tests locally or in CI/CD pipelines, ensuring the reliability and efficiency of web applications.
generative-ai-python
The Google AI Python SDK is the easiest way for Python developers to build with the Gemini API. The Gemini API gives you access to Gemini models created by Google DeepMind. Gemini models are built from the ground up to be multimodal, so you can reason seamlessly across text, images, and code.
flapi
flAPI is a powerful service that automatically generates read-only APIs for datasets by utilizing SQL templates. Built on top of DuckDB, it offers features like automatic API generation, support for Model Context Protocol (MCP), connecting to multiple data sources, caching, security implementation, and easy deployment. The tool allows users to create APIs without coding and enables the creation of AI tools alongside REST endpoints using SQL templates. It supports unified configuration for REST endpoints and MCP tools/resources, concurrent servers for REST API and MCP server, and automatic tool discovery. The tool also provides DuckLake-backed caching for modern, snapshot-based caching with features like full refresh, incremental sync, retention, compaction, and audit logs.
instructor
Instructor is a Python library that makes it a breeze to work with structured outputs from large language models (LLMs). Built on top of Pydantic, it provides a simple, transparent, and user-friendly API to manage validation, retries, and streaming responses. Get ready to supercharge your LLM workflows!
For similar tasks
gambit
Gambit is an open-source developer-first framework for building reliable LLM workflows. It helps compose small, typed 'decks' with clear inputs/outputs and guardrails. Users can run decks locally, stream traces, and debug with a built-in UI. The framework aims to improve orchestration by treating each step as a small deck, mixing LLM and compute tasks effortlessly, feeding models only necessary information, and providing built-in observability for debugging.
For similar jobs
promptflow
**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.
deepeval
DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.
MegaDetector
MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".
leapfrogai
LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.
llava-docker
This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.
carrot
The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.
TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.
AI-YinMei
AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.
