LogoAIAny
Icon for item

Ragas

Ragas is an open-source toolkit by VibrantLabs for evaluating and optimizing large language model (LLM) applications. It offers objective metrics (LLM-based and traditional), automated test-data generation, integrations with popular LLM frameworks and observability tools, and utilities for building feedback loops to improve production LLMs.

Introduction

Ragas — LLM Application Evaluation Toolkit

Ragas is an open-source project from VibrantLabs designed to make evaluation of LLM-powered applications objective, repeatable, and data-driven. It targets teams building retrieval-augmented generation (RAG) systems, summarization pipelines, chat assistants and other LLM-based workflows that require systematic testing and continuous improvement.

Core capabilities
  • Objective metrics: provide both LLM-based scoring (e.g., aspect-based critique) and traditional metrics to measure accuracy, relevance, faithfulness, and other aspects of model outputs.
  • Test data generation: automated generation of comprehensive, production-aligned test sets when a ready dataset is not available, enabling broader coverage of edge cases and scenarios.
  • Integrations: works with popular LLM frameworks (e.g., LangChain) and observability/telemetry tools to plug into existing development and monitoring stacks.
  • Feedback loops: utilities to leverage production data and evaluation results to retrain or tune components, close the loop between production issues and developer action.
  • Open analytics: collects minimal, anonymized usage data (with an opt-out) to help guide development while keeping user privacy in mind.
Typical use cases
  • Benchmarking and comparing LLM models for a specific task or workflow.
  • Running automated evaluations of summarization, question answering, retrieval quality, and other LLM outputs.
  • Generating synthetic or varied test inputs to stress-test LLM behavior before deployment.
  • Building observability-driven improvement loops where evaluation results trigger follow-up actions (e.g., prompt updates, dataset curation).
Getting started (high level)
  1. Install: pip install ragas or pip install git+https://github.com/vibrantlabsai/ragas.
  2. Create or bootstrap a project with ragas quickstart (templates available for RAG evaluations).
  3. Define metrics (for example, AspectCritic) and evaluation workflows; Ragas supports async evaluation with LLM backends.
  4. Integrate outputs into dashboards or CI pipelines to track regressions and improvements over time.
Developer & community details
  • Repository and docs: the project is hosted on GitHub and has dedicated documentation (official docs site). The project includes examples, quickstart templates, and API references.
  • License: Apache-2.0 (open-source).
  • Community: Discord server and newsletter for announcements, office hours, and support.
  • Contributors: maintained by VibrantLabs (org: vibrantlabsai) with community contributions welcome.
Example (simple metric usage)

The project demonstrates how to create LLM-backed metrics (e.g., an AspectCritic) and run evaluations in code. Example usage in the README shows setting up an LLM, defining a metric and scoring a response programmatically.

Why it matters

Evaluating LLM apps is often subjective and ad-hoc. Ragas offers structured, repeatable evaluation workflows and tooling that help teams measure regression, prioritize fixes, and quantify user-facing improvements. For teams operating production LLM systems, these capabilities reduce risk and improve the ability to ship reliably.

Information

  • Websitegithub.com
  • AuthorsVibrantLabs (vibrantlabsai)
  • Published date2023/05/08

Categories

More Items