athina-evals

athina-evals

Python SDK for running evaluations on LLM generated responses

Stars: 209

Visit
 screenshot

Athina is an open-source library designed to help engineers improve the reliability and performance of Large Language Models (LLMs) through eval-driven development. It offers plug-and-play preset evals for catching and preventing bad outputs, measuring model performance, running experiments, A/B testing models, detecting regressions, and monitoring production data. Athina provides a solution to the flaws in current LLM developer workflows by offering rapid experimentation, customizable evaluators, integrated dashboard, consistent metrics, historical record tracking, and easy setup. It includes preset evaluators for RAG applications and summarization accuracy, as well as the ability to write custom evals. Athina's evals can run on both development and production environments, providing consistent metrics and removing the need for manual infrastructure setup.

README:

Overview

Athina is an Observability and Experimentation platform for AI teams.

This SDK is an open-source repository of 50+ preset evals. You can also use custom evals.

This SDK also serves as a companion to Athina IDE where you can prototype pipelines, run experiments and evaluations, and compare datasets.


Quick Start

Follow this notebook for a quick start guide.

To get an Athina API key, sign up at https://app.athina.ai


Run Evals

These evals can be run programmatically, or via the UI on Athina IDE.

image


Compare datasets side-by-side (Docs)

Once a dataset is logged to Athina IDE, you can also compare it against another dataset.

image

Once you run evals using Athina, they will be visible in Athina IDE where you can run experiments, evals, and compare datasets side-by-side.


Preset Evals

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for athina-evals

Similar Open Source Tools

For similar tasks

For similar jobs