empirical

empirical

Test and evaluate LLMs and model configurations, across all the scenarios that matter for your application

Stars: 134

Visit
 screenshot

Empirical is a tool that allows you to test different LLMs, prompts, and other model configurations across all the scenarios that matter for your application. With Empirical, you can run your test datasets locally against off-the-shelf models, test your own custom models and RAG applications, view, compare, and analyze outputs on a web UI, score your outputs with scoring functions, and run tests on CI/CD.

README:

Empirical

npm Discord

Empirical is the fastest way to test different LLMs and model configurations, across all the scenarios that matter for your application.

With Empirical, you can

https://github.com/empirical-run/empirical/assets/284612/65d96ecc-12a2-474d-a81e-bbddb71106b6

Usage

See all docs →

Empirical bundles together a test runner and a web app. These can be used through the CLI in your terminal window.

Empirical relies on a configuration file, typically located at empiricalrc.js which describes the test to run.

Start with a basic example

In this example, we will ask an LLM to extract entities from user messages and give us a structured JSON output. For example, "I'm Alice from Maryland" will become {name: 'Alice', location: 'Maryland'}.

Our test will succeed if the model outputs valid JSON.

  1. Use the CLI to create a sample configuration file called empiricalrc.js.

    npm init empiricalrun
    
    # For TypeScript
    npm init empiricalrun -- --using-ts
  2. Run the example dataset against the selected models.

    npx empiricalrun

    This step requires the OPENAI_API_KEY environment variable to authenticate with OpenAI. This execution will cost $0.0026, based on the selected models.

  3. Use the ui command to open the reporter web app and see side-by-side results.

    npx empiricalrun ui

Make it yours

Edit the empiricalrc.js file to make Empirical work for your use-case.

Contribution guide

See development docs.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for empirical

Similar Open Source Tools

For similar tasks

For similar jobs