Automatic evaluations in Murnitur AI are initiated directly from the user interface, providing a streamlined process for assessing LLM performance.

Initial Setup

  • Upload a Test Dataset: Begin by uploading an evaluation dataset. This dataset should be in CSV format and can contain any headers. However, it must include the following columns:

    • context
    • ground_truth
    • retrieval_context (optional)
  • Download the Template: To ensure your dataset is correctly formatted, you can download the template here.

Evaluation Run

Go to AI Evaluations in the sidebar and click on the New Evaluation Run button.

  1. Choose Preset
  1. Configure LLM Model
  1. Select Evaluation Dataset
  1. Choose Evaluation Metrics
  1. Result