lighteval

lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

Stars: 730

Visit
 screenshot

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron. We're releasing it with the community in the spirit of building in the open. Note that it is still very much early so don't expect 100% stability ^^' In case of problems or question, feel free to open an issue!

README:


lighteval library logo

Your go-to toolkit for lightning-fast, flexible LLM evaluation, from Hugging Face's Leaderboard and Evals Team.

Tests Quality Python versions License Version


Documentation: Lighteval's Wiki


Unlock the Power of LLM Evaluation with Lighteval ๐Ÿš€

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsโ€”whether it's transformers, tgi, vllm, or nanotronโ€”with ease. Dive deep into your modelโ€™s performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack-up.

Customization at your fingertips: letting you either browse all our existing tasks and metrics or effortlessly create your own, tailored to your needs.

Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.

๐Ÿ”‘ Key Features

โšก๏ธ Installation

pip install lighteval[accelerate]

Lighteval allows for many extras when installing, see here for a complete list.

If you want to push results to the Hugging Face Hub, add your access token as an environment variable:

huggingface-cli login

๐Ÿš€ Quickstart

Lighteval offers two main entry points for model evaluation:

Hereโ€™s a quick command to evaluate using the Accelerate backend:

lighteval accelerate \
    --model_args "pretrained=gpt2" \
    --tasks "leaderboard|truthfulqa:mc|0|0" \
    --override_batch_size 1 \
    --output_dir="./evals/"

๐Ÿ™ Acknowledgements

Lighteval started as an extension of the fantastic Eleuther AI Harness (which powers the Open LLM Leaderboard) and draws inspiration from the amazing HELM framework.

While evolving Lighteval into its own standalone tool, we are grateful to the Harness and HELM teams for their pioneering work on LLM evaluations.

๐ŸŒŸ Contributions Welcome ๐Ÿ’™๐Ÿ’š๐Ÿ’›๐Ÿ’œ๐Ÿงก

Got ideas? Found a bug? Want to add a task or metric? Contributions are warmly welcomed!

๐Ÿ“œ Citation

@misc{lighteval,
  author = {Fourrier, Clรฉmentine and Habib, Nathan and Wolf, Thomas and Tunstall, Lewis},
  title = {LightEval: A lightweight framework for LLM evaluation},
  year = {2023},
  version = {0.5.0},
  url = {https://github.com/huggingface/lighteval}
}

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for lighteval

Similar Open Source Tools

For similar tasks

For similar jobs