simple-llm-eval

simple-llm-eval

Simple LLM Evaluation Using LLM As a Judge πŸ‘©β€βš–οΈ

Stars: 62

Visit
 screenshot

Simpleval is a Python package for evaluating Large Language Models (LLMs) using the 'LLM as a Judge' technique. It supports various LLM providers such as OpenAI, Google, AWS, Anthropic, Azure, and more. The package includes reports for analyzing and summarizing evaluation results.

README:

Simple LLM Evaluation

version Build Status Code Coverage PyPI - Python Version license OpenSSF Scorecard

Simpleval Banner

Welcome to the simple LLM evaluation frameworkβ€”simpleval, for short.

simpleval is a Python package designed to make evaluating Large Language Models (LLMs) easier, using the "LLM as a Judge" technique.

It supports a variety of LLM providers, including OpenAI, Google (Gemini API, Vertex), AWS Bedrock, Anthropic, Azure, and more (via LiteLLM).

simpleval also includes several reports to help you analyze, compare, and summarize your evaluation results. See the available reports for more details.

Getting Started

See the πŸ“š Quickstart Guide πŸ“š

Documentation

See πŸ“š Project Documentation πŸ“š

Contributing

We appreciate your help in making this project better! ✨

If you would like to contribute to this project, please follow the guidelines outlined in the CONTRIBUTING.md file.

License

simpleval is released under the Apache License. See the LICENSE file for more details.

Contact

If you have any questions or suggestions, feel free to join our GitHub discussions forum πŸ’¬

If you want to report a bug or request a feature, please open an issue in the GitHub issues tracker πŸ›


For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for simple-llm-eval

Similar Open Source Tools

No tools available

For similar tasks

For similar jobs