ragas

ragas

Supercharge Your LLM Application Evaluations 🚀

Stars: 6888

Visit
 screenshot

Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard. This is where Ragas (RAG Assessment) comes in. Ragas provides you with the tools based on the latest research for evaluating LLM-generated text to give you insights about your RAG pipeline. Ragas can be integrated with your CI/CD to provide continuous checks to ensure performance.

README:

Supercharge Your LLM Application Evaluations 🚀

GitHub release Build License Open In Colab discord-invite

Documentation | Quick start | Join Discord

Objective metrics, intelligent test generation, and data-driven insights for LLM apps

Ragas is your ultimate toolkit for evaluating and optimizing Large Language Model (LLM) applications. Say goodbye to time-consuming, subjective assessments and hello to data-driven, efficient evaluation workflows. Don't have a test dataset ready? We also do production-aligned test set generation.

Key Features

  • 🎯 Objective Metrics: Evaluate your LLM applications with precision using both LLM-based and traditional metrics.
  • 🧪 Test Data Generation: Automatically create comprehensive test datasets covering a wide range of scenarios.
  • 🔗 Seamless Integrations: Works flawlessly with popular LLM frameworks like LangChain and major observability tools.
  • 📊 Build feedback loops: Leverage production data to continually improve your LLM applications.

🛡️ Installation

Pypi:

pip install ragas

Alternatively, from source:

pip install git+https://github.com/explodinggradients/ragas

🔥 Quickstart

Evaluate your RAG with Ragas metrics

This is 4 main lines:

from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness
from langchain_openai.chat_models import ChatOpenAI
from ragas.llms import LangchainLLMWrapper

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))
metrics = [LLMContextRecall(), FactualCorrectness(), Faithfulness()]
results = evaluate(dataset=eval_dataset, metrics=metrics, llm=evaluator_llm)

Find the complete RAG Evaluation Quickstart here: https://docs.ragas.io/en/latest/getstarted/rag_evaluation/

🖱️Click to see preview of RESULTS
user_input retrieved_contexts response reference context_recall factual_correctness faithfulness
What are the global implications of the USA Supreme Court ruling on abortion? "- In 2022, the USA Supreme Court ... - The ruling has created a chilling effect ..." The global implications ... Here are some potential implications: The global implications ... Additionally, the ruling has had an impact beyond national borders ... 1 0.47 0.516129
Which companies are the main contributors to GHG emissions ... ? "- Fossil fuel companies ... - Between 2010 and 2020, human mortality ..." According to the Carbon Majors database ... Here are the top contributors: According to the Carbon Majors database ... Additionally, between 2010 and 2020, human mortality ... 1 0.11 0.172414
Which private companies in the Americas are the largest GHG emitters ... ? "The private companies responsible ... The largest emitter amongst state-owned companies ..." According to the Carbon Majors database, the largest private companies ... The largest private companies in the Americas ... 1 0.26 0

Generate a test dataset for comprehensive RAG evaluation

What if you don't have the data for folks asking questions when they interact with your RAG system?

Ragas can help by generating synthetic test set generation -- where you can seed it with your data and control the difficulty, variety, and complexity.

🫂 Community

If you want to get more involved with Ragas, check out our discord server. It's a fun community where we geek out about LLM, Retrieval, Production issues, and more.

Contributors

+----------------------------------------------------------------------------+
|     +----------------------------------------------------------------+     |
|     | Developers: Those who built with `ragas`.                      |     |
|     | (You have `import ragas` somewhere in your project)            |     |
|     |     +----------------------------------------------------+     |     |
|     |     | Contributors: Those who make `ragas` better.       |     |     |
|     |     | (You make PR to this repo)                         |     |     |
|     |     +----------------------------------------------------+     |     |
|     +----------------------------------------------------------------+     |
+----------------------------------------------------------------------------+

We welcome contributions from the community! Whether it's bug fixes, feature additions, or documentation improvements, your input is valuable.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

🔍 Open Analytics

At Ragas, we believe in transparency. We collect minimal, anonymized usage data to improve our product and guide our development efforts.

✅ No personal or company-identifying information

✅ Open-source data collection code

✅ Publicly available aggregated data

To opt-out, set the RAGAS_DO_NOT_TRACK environment variable to true.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for ragas

Similar Open Source Tools

For similar tasks

For similar jobs