stagehand
An AI web browsing framework focused on simplicity and extensibility.
Stars: 2894
Stagehand is an AI web browsing framework that simplifies and extends web automation using three simple APIs: act, extract, and observe. It aims to provide a lightweight, configurable framework without complex abstractions, allowing users to automate web tasks reliably. The tool generates Playwright code based on atomic instructions provided by the user, enabling natural language-driven web automation. Stagehand is open source, maintained by the Browserbase team, and supports different models and model providers for flexibility in automation tasks.
README:
An AI web browsing framework focused on simplicity and extensibility.
- Intro
- Getting Started
- API Reference
- Model Support
- How It Works
- Stagehand vs Playwright
- Prompting Tips
- Roadmap
- Contributing
- Acknowledgements
- License
[!NOTE]
Stagehand
is currently available as an early release, and we're actively seeking feedback from the community. Please join our Slack community to stay updated on the latest developments and provide feedback.
Stagehand is the AI-powered successor to Playwright, offering three simple APIs (act
, extract
, and observe
) that provide the building blocks for natural language driven web automation.
The goal of Stagehand is to provide a lightweight, configurable framework, without overly complex abstractions, as well as modular support for different models and model providers. It's not going to order you a pizza, but it will help you reliably automate the web.
Each Stagehand function takes in an atomic instruction, such as act("click the login button")
or extract("find the red shoes")
, generates the appropriate Playwright code to accomplish that instruction, and executes it.
Instructions should be atomic to increase reliability, and step planning should be handled by the higher level agent. You can use observe()
to get a suggested list of actions that can be taken on the current page, and then use those to ground your step planning prompts.
Stagehand is open source and maintained by the Browserbase team. We believe that by enabling more developers to build reliable web automations, we'll expand the market of developers who benefit from our headless browser infrastructure. This is the framework that we wished we had while tinkering on our own applications, and we're excited to share it with you.
We also install zod to power typed extraction
npm install @browserbasehq/stagehand zod
You'll need to provide your API Key for the model provider you'd like to use. The default model provider is OpenAI, but you can also use Anthropic or others. More information on supported models can be found in the API Reference.
Ensure that an OpenAI API Key or Anthropic API key is accessible in your local environment.
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-...
If you plan to run the browser locally, you'll also need to install Playwright's browser dependencies.
npm exec playwright install
Then you can create a Stagehand instance like so:
import { Stagehand } from "@browserbasehq/stagehand";
import { z } from "zod";
const stagehand = new Stagehand({
env: "LOCAL",
});
If you plan to run the browser remotely, you'll need to set a Browserbase API Key and Project ID.
export BROWSERBASE_API_KEY=...
export BROWSERBASE_PROJECT_ID=...
import { Stagehand } from "@browserbasehq/stagehand";
import { z } from "zod";
const stagehand = new Stagehand({
env: "BROWSERBASE",
enableCaching: true,
});
await stagehand.init();
const page = stagehand.page;
await page.goto("https://github.com/browserbase/stagehand");
await page.act({ action: "click on the contributors" });
const contributor = await page.extract({
instruction: "extract the top contributor",
schema: z.object({
username: z.string(),
url: z.string(),
}),
});
await stagehand.close();
console.log(`Our favorite contributor is ${contributor.username}`);
This simple snippet will open a browser, navigate to the Stagehand repo, and log the top contributor.
This constructor is used to create an instance of Stagehand.
-
Arguments:
-
env
:'LOCAL'
or'BROWSERBASE'
. Defaults to'BROWSERBASE'
. -
modelName
: (optional) anAvailableModel
string to specify the default model to use. -
modelClientOptions
: (optional) configuration options for the model client. -
enableCaching
: aboolean
that enables caching of LLM responses. When set totrue
, the LLM requests will be cached on disk and reused for identical requests. Defaults tofalse
. -
headless
: aboolean
that determines if the browser runs in headless mode. Defaults tofalse
. When the env is set toBROWSERBASE
, this will be ignored. -
domSettleTimeoutMs
: aninteger
that specifies the timeout in milliseconds for waiting for the DOM to settle. Defaults to 30000 (30 seconds). -
apiKey
: (optional) your Browserbase API key. Defaults toBROWSERBASE_API_KEY
environment variable. -
projectId
: (optional) your Browserbase project ID. Defaults toBROWSERBASE_PROJECT_ID
environment variable. -
browserbaseSessionCreateParams
: configuration options for creating new Browserbase sessions. -
browserbaseSessionID
: ID of an existing live Browserbase session. OverridesbrowserbaseSessionCreateParams
. -
logger
: a function that handles log messages. Useful for custom logging implementations. -
verbose
: aninteger
that enables several levels of logging during automation:-
0
: limited to no logging -
1
: SDK-level logging -
2
: LLM-client level logging (most granular)
-
-
debugDom
: aboolean
that draws bounding boxes around elements presented to the LLM during automation. -
llmClient
: (optional) a customLLMClient
implementation.
-
-
Returns:
- An instance of the
Stagehand
class configured with the specified options.
- An instance of the
-
Example:
// Basic usage const stagehand = new Stagehand(); // Custom configuration const stagehand = new Stagehand({ env: "LOCAL", verbose: 1, headless: true, enableCaching: true, logger: (logLine) => { console.log(`[${logLine.category}] ${logLine.message}`); }, }); // Resume existing Browserbase session const stagehand = new Stagehand({ env: "BROWSERBASE", browserbaseSessionID: "existing-session-id", });
init()
asynchronously initializes the Stagehand instance. It should be called before any other methods.
[!WARNING]
Passing parameters toinit()
is deprecated and will be removed in the next major version. Use the constructor options instead.
-
Arguments:
-
modelName
: (deprecated, optional) anAvailableModel
string to specify the model to use. This will be used for all other methods unless overridden. -
modelClientOptions
: (deprecated, optional) configuration options for the model client -
domSettleTimeoutMs
: (deprecated, optional) timeout in milliseconds for waiting for the DOM to settle
-
-
Returns:
- A
Promise
that resolves to an object containing:-
debugUrl
: astring
representing the URL for live debugging. This is only available when using a Browserbase browser. -
sessionUrl
: astring
representing the session URL. This is only available when using a Browserbase browser. -
sessionId
: astring
representing the session ID. This is only available when using a Browserbase browser.
-
- A
-
Example:
await stagehand.init();
act()
allows Stagehand to interact with a web page. Provide an action
like "search for 'x'"
, or "select the cheapest flight presented"
(small atomic goals perform the best).
[!WARNING]
act()
on the Stagehand instance is deprecated and will be removed in the next major version. Usestagehand.page.act()
instead.
-
Arguments:
-
action
: astring
describing the action to perform -
modelName
: (optional) anAvailableModel
string to specify the model to use -
modelClientOptions
: (optional) configuration options for the model client -
useVision
: (optional) aboolean
or"fallback"
to determine if vision-based processing should be used. Defaults to"fallback"
-
variables
: (optional) aRecord<string, string>
of variables to use in the action. Variables in the action string are referenced using%variable_name%
-
domSettleTimeoutMs
: (optional) timeout in milliseconds for waiting for the DOM to settle
-
-
Returns:
- A
Promise
that resolves to an object containing:-
success
: aboolean
indicating if the action was completed successfully. -
message
: astring
providing details about the action's execution. -
action
: astring
describing the action performed.
-
- A
-
Example:
// Basic usage await stagehand.page.act({ action: "click on add to cart" }); // Using variables await stagehand.page.act({ action: "enter %username% into the username field", variables: { username: "[email protected]", }, }); // Multiple variables await stagehand.page.act({ action: "fill in the form with %username% and %password%", variables: { username: "john.doe", password: "secretpass123", }, });
extract()
grabs structured text from the current page using zod. Given instructions and schema
, you will receive structured data. Unlike some extraction libraries, stagehand can extract any information on a page, not just the main article contents.
[!WARNING]
extract()
on the Stagehand instance is deprecated and will be removed in the next major version. Usestagehand.page.extract()
instead.
-
Arguments:
-
instruction
: astring
providing instructions for extraction -
schema
: az.AnyZodObject
defining the structure of the data to extract -
modelName
: (optional) anAvailableModel
string to specify the model to use -
modelClientOptions
: (optional) configuration options for the model client -
domSettleTimeoutMs
: (optional) timeout in milliseconds for waiting for the DOM to settle -
useTextExtract
: (optional) aboolean
to determine if text-based extraction should be used. Defaults tofalse
-
-
Returns:
- A
Promise
that resolves to the structured data as defined by the providedschema
.
- A
-
Example:
const price = await stagehand.page.extract({ instruction: "extract the price of the item", schema: z.object({ price: z.number(), }), });
[!WARNING]
observe()
on the Stagehand instance is deprecated and will be removed in the next major version. Usestagehand.page.observe()
instead.
[!NOTE]
observe()
currently only evaluates the first chunk in the page.
observe()
is used to get a list of actions that can be taken on the current page. It's useful for adding context to your planning step, or if you unsure of what page you're on.
If you are looking for a specific element, you can also pass in an instruction to observe via: observe({ instruction: "{your instruction}"})
.
-
Arguments:
-
instruction
: (optional) astring
providing instructions for the observation. Defaults to "Find actions that can be performed on this page." -
modelName
: (optional) anAvailableModel
string to specify the model to use -
modelClientOptions
: (optional) configuration options for the model client -
useVision
: (optional) aboolean
to determine if vision-based processing should be used. Defaults tofalse
-
domSettleTimeoutMs
: (optional) timeout in milliseconds for waiting for the DOM to settle
-
-
Returns:
- A
Promise
that resolves to an array of objects containing:-
selector
: astring
representing the element selector -
description
: astring
describing the possible action
-
- A
-
Example:
const actions = await stagehand.page.observe();
close()
is a cleanup method to remove the temporary files created by Stagehand. It's highly recommended that you call this when you're done with your automation.
-
Example:
await stagehand.close();
page
and context
are instances of Playwright's Page
and BrowserContext
respectively. Use these methods to interact with the Playwright instance that Stagehand is using. Most commonly, you'll use page.goto()
to navigate to a URL.
-
Example:
await stagehand.page.goto("https://github.com/browserbase/stagehand");
log()
is used to print a message to the browser console. These messages will be persisted in the Browserbase session logs, and can be used to debug sessions after they've completed.
Make sure the log level is above the verbose level you set when initializing the Stagehand instance.
-
Example:
stagehand.log("Hello, world!");
Stagehand leverages a generic LLM client architecture to support various language models from different providers. This design allows for flexibility, enabling the integration of new models with minimal changes to the core system. Different models work better for different tasks, so you can choose the model that best suits your needs.
Stagehand currently supports the following models from OpenAI and Anthropic:
-
OpenAI Models:
gpt-4o
gpt-4o-mini
gpt-4o-2024-08-06
-
Anthropic Models:
claude-3-5-sonnet-latest
claude-3-5-sonnet-20240620
claude-3-5-sonnet-20241022
These models can be specified when initializing the Stagehand
instance or when calling methods like act()
and extract()
.
The SDK has two major phases:
- Processing the DOM (including chunking - see below).
- Taking LLM powered actions based on the current state of the DOM.
Stagehand uses a combination of techniques to prepare the DOM.
The DOM Processing steps look as follows:
- Via Playwright, inject a script into the DOM accessible by the SDK that can run processing.
- Crawl the DOM and create a list of candidate elements.
- Candidate elements are either leaf elements (DOM elements that contain actual user facing substance), or are interactive elements.
- Interactive elements are determined by a combination of roles and HTML tags.
- Candidate elements that are not active, visible, or at the top of the DOM are discarded.
- The LLM should only receive elements it can faithfully act on on behalf of the agent/user.
- For each candidate element, an xPath is generated. this guarantees that if this element is picked by the LLM, we'll be able to reliably target it.
- Return both the list of candidate elements, as well as the map of elements to xPath selectors across the browser back to the SDK, to be analyzed by the LLM.
While LLMs will continue to increase context window length and reduce latency, giving any reasoning system less stuff to think about should make it more reliable. As a result, DOM processing is done in chunks in order to keep the context small per inference call. In order to chunk, the SDK considers a candidate element that starts in a section of the viewport to be a part of that chunk. In the future, padding will be added to ensure that an individual chunk does not lack relevant context. See this diagram for how it looks:
The act()
and observe()
methods can take a useVision
flag. If this is set to true
, the LLM will be provided with a annotated screenshot of the current page to identify which elements to act on. This is useful for complex DOMs that the LLM has a hard time reasoning about, even after processing and chunking. By default, this flag is set to "fallback"
, which means that if the LLM fails to successfully identify a single element, Stagehand will retry the attempt using vision.
Now we have a list of candidate elements and a way to select them. We can present those elements with additional context to the LLM for extraction or action. While untested on a large scale, presenting a "numbered list of elements" guides the model to not treat the context as a full DOM, but as a list of related but independent elements to operate on.
In the case of action, we ask the LLM to write a playwright method in order to do the correct thing. In our limited testing, playwright syntax is much more effective than relying on built in javascript APIs, possibly due to tokenization.
Lastly, we use the LLM to write future instructions to itself to help manage it's progress and goals when operating across chunks.
Below is an example of how to extract a list of companies from the AI Grant website using both Stagehand and Playwright.
Prompting Stagehand is more literal and atomic than other higher level frameworks, including agentic frameworks. Here are some guidelines to help you craft effective prompts:
- Use specific and concise actions
await stagehand.page.act({ action: "click the login button" });
const productInfo = await stagehand.page.extract({
instruction: "find the red shoes",
schema: z.object({
productName: z.string(),
price: z.number(),
}),
});
- Break down complex tasks into smaller, atomic steps
Instead of combining actions:
// Avoid this
await stagehand.page.act({ action: "log in and purchase the first item" });
Split them into individual steps:
await stagehand.page.act({ action: "click the login button" });
// ...additional steps to log in...
await stagehand.page.act({ action: "click on the first item" });
await stagehand.page.act({ action: "click the purchase button" });
- Use
observe()
to get actionable suggestions from the current page
const actions = await stagehand.page.observe();
console.log("Possible actions:", actions);
- Use broad or ambiguous instructions
// Too vague
await stagehand.page.act({ action: "find something interesting on the page" });
- Combine multiple actions into one instruction
// Avoid combining actions
await stagehand.page.act({ action: "fill out the form and submit it" });
- Expect Stagehand to perform high-level planning or reasoning
// Outside Stagehand's scope
await stagehand.page.act({ action: "book the cheapest flight available" });
By following these guidelines, you'll increase the reliability and effectiveness of your web automations with Stagehand. Remember, Stagehand excels at executing precise, well-defined actions so keeping your instructions atomic will lead to the best outcomes.
We leave the agentic behaviour to higher-level agentic systems which can use Stagehand as a tool.
At a high level, we're focused on improving reliability, speed, and cost in that order of priority.
You can see the roadmap here. Looking to contribute? Read on!
[!NOTE]
We highly value contributions to Stagehand! For support or code review, please join our Slack community.
First, clone the repo
git clone [email protected]:browserbase/stagehand.git
Then install dependencies
npm install
Ensure you have the .env
file as documented above in the Getting Started section.
Then, run the example script npm run example
.
A good development loop is:
- Try things in the example file
- Use that to make changes to the SDK
- Write evals that help validate your changes
- Make sure you don't break existing evals!
- Open a PR and get it reviewed by the team.
You'll need a Braintrust API key to run evals
BRAINTRUST_API_KEY=""
After that, you can run all evals at once using npm run evals
You can also run individual evals using npm run evals -- your_eval_name
.
Running all evals can take some time. We have a convenience script example.ts
where you can develop your new single eval before adding it to the set of all evals.
You can run npm run example
to execute and iterate on the eval you are currently developing.
To add a new model to Stagehand, follow these steps:
-
Define the Model: Add the new model name to the
AvailableModel
type in theLLMProvider.ts
file. This ensures that the model is recognized by the system. -
Map the Model to a Provider: Update the
modelToProviderMap
in theLLMProvider
class to associate the new model with its corresponding provider. This mapping is crucial for determining which client to use. -
Implement the Client: If the new model requires a new client, implement a class that adheres to the
LLMClient
interface. This class should define all necessary methods, such ascreateChatCompletion
. -
Update the
getClient
Method: Modify thegetClient
method in theLLMProvider
class to return an instance of the new client when the new model is requested.
Stagehand uses tsup to build the SDK and vanilla esbuild to build the scripts that run in the DOM.
- run
npm run build
- run
npm pack
to get a tarball for distribution
This project heavily relies on Playwright as a resilient backbone to automate the web. It also would not be possible without the awesome techniques and discoveries made by tarsier, and fuji-web.
Jeremy Press wrote the original MVP of Stagehand and continues to be a major ally to the project.
Licensed under the MIT License.
Copyright 2024 Browserbase, Inc.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for stagehand
Similar Open Source Tools
stagehand
Stagehand is an AI web browsing framework that simplifies and extends web automation using three simple APIs: act, extract, and observe. It aims to provide a lightweight, configurable framework without complex abstractions, allowing users to automate web tasks reliably. The tool generates Playwright code based on atomic instructions provided by the user, enabling natural language-driven web automation. Stagehand is open source, maintained by the Browserbase team, and supports different models and model providers for flexibility in automation tasks.
LLMUnity
LLM for Unity enables seamless integration of Large Language Models (LLMs) within the Unity engine, allowing users to create intelligent characters for immersive player interactions. The tool supports major LLM models, runs locally without internet access, offers fast inference on CPU and GPU, and is easy to set up with a single line of code. It is free for both personal and commercial use, tested on Unity 2021 LTS, 2022 LTS, and 2023. Users can build multiple AI characters efficiently, use remote servers for processing, and customize model settings for text generation.
codespin
CodeSpin.AI is a set of open-source code generation tools that leverage large language models (LLMs) to automate coding tasks. With CodeSpin, you can generate code in various programming languages, including Python, JavaScript, Java, and C++, by providing natural language prompts. CodeSpin offers a range of features to enhance code generation, such as custom templates, inline prompting, and the ability to use ChatGPT as an alternative to API keys. Additionally, CodeSpin provides options for regenerating code, executing code in prompt files, and piping data into the LLM for processing. By utilizing CodeSpin, developers can save time and effort in coding tasks, improve code quality, and explore new possibilities in code generation.
magic-cli
Magic CLI is a command line utility that leverages Large Language Models (LLMs) to enhance command line efficiency. It is inspired by projects like Amazon Q and GitHub Copilot for CLI. The tool allows users to suggest commands, search across command history, and generate commands for specific tasks using local or remote LLM providers. Magic CLI also provides configuration options for LLM selection and response generation. The project is still in early development, so users should expect breaking changes and bugs.
py-vectara-agentic
The `vectara-agentic` Python library is designed for developing powerful AI assistants using Vectara and Agentic-RAG. It supports various agent types, includes pre-built tools for domains like finance and legal, and enables easy creation of custom AI assistants and agents. The library provides tools for summarizing text, rephrasing text, legal tasks like summarizing legal text and critiquing as a judge, financial tasks like analyzing balance sheets and income statements, and database tools for inspecting and querying databases. It also supports observability via LlamaIndex and Arize Phoenix integration.
aiid
The Artificial Intelligence Incident Database (AIID) is a collection of incidents involving the development and use of artificial intelligence (AI). The database is designed to help researchers, policymakers, and the public understand the potential risks and benefits of AI, and to inform the development of policies and practices to mitigate the risks and promote the benefits of AI. The AIID is a collaborative project involving researchers from the University of California, Berkeley, the University of Washington, and the University of Toronto.
garak
Garak is a vulnerability scanner designed for LLMs (Large Language Models) that checks for various weaknesses such as hallucination, data leakage, prompt injection, misinformation, toxicity generation, and jailbreaks. It combines static, dynamic, and adaptive probes to explore vulnerabilities in LLMs. Garak is a free tool developed for red-teaming and assessment purposes, focusing on making LLMs or dialog systems fail. It supports various LLM models and can be used to assess their security and robustness.
hume-python-sdk
The Hume AI Python SDK allows users to integrate Hume APIs directly into their Python applications. Users can access complete documentation, quickstart guides, and example notebooks to get started. The SDK is designed to provide support for Hume's expressive communication platform built on scientific research. Users are encouraged to create an account at beta.hume.ai and stay updated on changes through Discord. The SDK may undergo breaking changes to improve tooling and ensure reliable releases in the future.
garak
Garak is a free tool that checks if a Large Language Model (LLM) can be made to fail in a way that is undesirable. It probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. Garak's a free tool. We love developing it and are always interested in adding functionality to support applications.
llm-vscode
llm-vscode is an extension designed for all things LLM, utilizing llm-ls as its backend. It offers features such as code completion with 'ghost-text' suggestions, the ability to choose models for code generation via HTTP requests, ensuring prompt size fits within the context window, and code attribution checks. Users can configure the backend, suggestion behavior, keybindings, llm-ls settings, and tokenization options. Additionally, the extension supports testing models like Code Llama 13B, Phind/Phind-CodeLlama-34B-v2, and WizardLM/WizardCoder-Python-34B-V1.0. Development involves cloning llm-ls, building it, and setting up the llm-vscode extension for use.
godot-llm
Godot LLM is a plugin that enables the utilization of large language models (LLM) for generating content in games. It provides functionality for text generation, text embedding, multimodal text generation, and vector database management within the Godot game engine. The plugin supports features like Retrieval Augmented Generation (RAG) and integrates llama.cpp-based functionalities for text generation, embedding, and multimodal capabilities. It offers support for various platforms and allows users to experiment with LLM models in their game development projects.
airport-codes
A website that tries to make sense of those three-letter airport codes. It provides detailed information about each airport, including its name, location, and a description. The site also includes a search function that allows users to find airports by name, city, or country. Airport content can be found in `/data` in individual files. Use the three-letter airport code as the filename (e.g. `phx.json`). Content in each `json` file: `id` = three-letter code (e.g. phx), `name` = airport name (Sky Harbor International Airport), `city` = primary city name (Phoenix), `state` = state name, if applicable (Arizona), `stateShort` = state abbreviation, if applicable (AZ), `country` = country name (USA), `description` = description, accepts markdown, use * for emphasis on letters, `imageCredit` = name of photographer, `imageCreditLink` = URL of photographer's Flickr page. You can also optionally add for aid in searching: `city2` = another city or country the airport may be known for. Adding a `json` file to `/data` will automatically render it. You do not need to manually add the path anywhere.
SpeziLLM
The Spezi LLM Swift Package includes modules that help integrate LLM-related functionality in applications. It provides tools for local LLM execution, usage of remote OpenAI-based LLMs, and LLMs running on Fog node resources within the local network. The package contains targets like SpeziLLM, SpeziLLMLocal, SpeziLLMLocalDownload, SpeziLLMOpenAI, and SpeziLLMFog for different LLM functionalities. Users can configure and interact with local LLMs, OpenAI LLMs, and Fog LLMs using the provided APIs and platforms within the Spezi ecosystem.
HuggingFaceGuidedTourForMac
HuggingFaceGuidedTourForMac is a guided tour on how to install optimized pytorch and optionally Apple's new MLX, JAX, and TensorFlow on Apple Silicon Macs. The repository provides steps to install homebrew, pytorch with MPS support, MLX, JAX, TensorFlow, and Jupyter lab. It also includes instructions on running large language models using HuggingFace transformers. The repository aims to help users set up their Macs for deep learning experiments with optimized performance.
For similar tasks
stagehand
Stagehand is an AI web browsing framework that simplifies and extends web automation using three simple APIs: act, extract, and observe. It aims to provide a lightweight, configurable framework without complex abstractions, allowing users to automate web tasks reliably. The tool generates Playwright code based on atomic instructions provided by the user, enabling natural language-driven web automation. Stagehand is open source, maintained by the Browserbase team, and supports different models and model providers for flexibility in automation tasks.
extractor
Extractor is an AI-powered data extraction library for Laravel that leverages OpenAI's capabilities to effortlessly extract structured data from various sources, including images, PDFs, and emails. It features a convenient wrapper around OpenAI Chat and Completion endpoints, supports multiple input formats, includes a flexible Field Extractor for arbitrary data extraction, and integrates with Textract for OCR functionality. Extractor utilizes JSON Mode from the latest GPT-3.5 and GPT-4 models, providing accurate and efficient data extraction.
NeMo-Guardrails
NeMo Guardrails is an open-source toolkit for easily adding _programmable guardrails_ to LLM-based conversational applications. Guardrails (or "rails" for short) are specific ways of controlling the output of a large language model, such as not talking about politics, responding in a particular way to specific user requests, following a predefined dialog path, using a particular language style, extracting structured data, and more.
kor
Kor is a prototype tool designed to help users extract structured data from text using Language Models (LLMs). It generates prompts, sends them to specified LLMs, and parses the output. The tool works with the parsing approach and is integrated with the LangChain framework. Kor is compatible with pydantic v2 and v1, and schema is typed checked using pydantic. It is primarily used for extracting information from text based on provided reference examples and schema documentation. Kor is designed to work with all good-enough LLMs regardless of their support for function/tool calling or JSON modes.
awesome-llm-json
This repository is an awesome list dedicated to resources for using Large Language Models (LLMs) to generate JSON or other structured outputs. It includes terminology explanations, hosted and local models, Python libraries, blog articles, videos, Jupyter notebooks, and leaderboards related to LLMs and JSON generation. The repository covers various aspects such as function calling, JSON mode, guided generation, and tool usage with different providers and models.
tensorzero
TensorZero is an open-source platform that helps LLM applications graduate from API wrappers into defensible AI products. It enables a data & learning flywheel for LLMs by unifying inference, observability, optimization, and experimentation. The platform includes a high-performance model gateway, structured schema-based inference, observability, experimentation, and data warehouse for analytics. TensorZero Recipes optimize prompts and models, and the platform supports experimentation features and GitOps orchestration for deployment.
For similar jobs
aiscript
AiScript is a lightweight scripting language that runs on JavaScript. It supports arrays, objects, and functions as first-class citizens, and is easy to write without the need for semicolons or commas. AiScript runs in a secure sandbox environment, preventing infinite loops from freezing the host. It also allows for easy provision of variables and functions from the host.
askui
AskUI is a reliable, automated end-to-end automation tool that only depends on what is shown on your screen instead of the technology or platform you are running on.
bots
The 'bots' repository is a collection of guides, tools, and example bots for programming bots to play video games. It provides resources on running bots live, installing the BotLab client, debugging bots, testing bots in simulated environments, and more. The repository also includes example bots for games like EVE Online, Tribal Wars 2, and Elvenar. Users can learn about developing bots for specific games, syntax of the Elm programming language, and tools for memory reading development. Additionally, there are guides on bot programming, contributing to BotLab, and exploring Elm syntax and core library.
ain
Ain is a terminal HTTP API client designed for scripting input and processing output via pipes. It allows flexible organization of APIs using files and folders, supports shell-scripts and executables for common tasks, handles url-encoding, and enables sharing the resulting curl, wget, or httpie command-line. Users can put things that change in environment variables or .env-files, and pipe the API output for further processing. Ain targets users who work with many APIs using a simple file format and uses curl, wget, or httpie to make the actual calls.
LaVague
LaVague is an open-source Large Action Model framework that uses advanced AI techniques to compile natural language instructions into browser automation code. It leverages Selenium or Playwright for browser actions. Users can interact with LaVague through an interactive Gradio interface to automate web interactions. The tool requires an OpenAI API key for default examples and offers a Playwright integration guide. Contributors can help by working on outlined tasks, submitting PRs, and engaging with the community on Discord. The project roadmap is available to track progress, but users should exercise caution when executing LLM-generated code using 'exec'.
robocorp
Robocorp is a platform that allows users to create, deploy, and operate Python automations and AI actions. It provides an easy way to extend the capabilities of AI agents, assistants, and copilots with custom actions written in Python. Users can create and deploy tools, skills, loaders, and plugins that securely connect any AI Assistant platform to their data and applications. The Robocorp Action Server makes Python scripts compatible with ChatGPT and LangChain by automatically creating and exposing an API based on function declaration, type hints, and docstrings. It simplifies the process of developing and deploying AI actions, enabling users to interact with AI frameworks effortlessly.
Open-Interface
Open Interface is a self-driving software that automates computer tasks by sending user requests to a language model backend (e.g., GPT-4V) and simulating keyboard and mouse inputs to execute the steps. It course-corrects by sending current screenshots to the language models. The tool supports MacOS, Linux, and Windows, and requires setting up the OpenAI API key for access to GPT-4V. It can automate tasks like creating meal plans, setting up custom language model backends, and more. Open Interface is currently not efficient in accurate spatial reasoning, tracking itself in tabular contexts, and navigating complex GUI-rich applications. Future improvements aim to enhance the tool's capabilities with better models trained on video walkthroughs. The tool is cost-effective, with user requests priced between $0.05 - $0.20, and offers features like interrupting the app and primary display visibility in multi-monitor setups.
AI-Case-Sorter-CS7.1
AI-Case-Sorter-CS7.1 is a project focused on building a case sorter using machine vision and machine learning AI to sort cases by headstamp. The repository includes Arduino code and 3D models necessary for the project.