NVIDIA Technical Blog

Evaluating Medical RAG with NVIDIA AI Endpoints and Ragas

    thumbnail

    Evaluating Medical RAG with NVIDIA AI Endpoints and Ragas

    Generate Synthetic Data

    One of the key challenges in RAG evaluation is the generation of synthetic data, which is necessary for robust evaluation of the system. Sample questions are generated based on the actual documents in the vector store, each associated with retrieved context and generated ground truth answers.

    Evaluate the Input Data

    The synthetic data is used as input for evaluation, focusing on generation-specific metrics such as BLEU score and perplexity.

    Apply to Semantic Search

    The system can be modified to evaluate semantic search based on keywords, creating custom metrics in Ragas to measure retrieval precision for semantic search queries.

    Refining with Structured Output

    Using the structured output feature of LangChain NVIDIA AI endpoints, the evaluation process can be refined to ensure accuracy, relevance, and up-to-date information while remaining faithful to the retrieved context.