Example
Knowledge Retention Evaluation
To evaluate how well your model retains information across interactions, use Murnitur’sknowledge_retention method. You can also set up a base threshold to define acceptable retention levels.
-
Import the Required Module:
-
Prepare Your Dataset:
-
Set Up a Base Threshold (Optional):
-
Run the Evaluation:
-
Sample Output:
Creating dataset
Side Note: If you are creating a dataset from a dictionary, use the following data structure:Loading dataset
Use any of the following functions to load the dataset:Extra Params
Extra parameters that can be passed torun_suite
save_output: A boolean. Set to false if you do not want to store the results in Murnitur AI.async_mode: Indicates whether the requests should run asynchronously.
Evaluation Metrics
We currently support the following evaluation metrics:hallucination: Measures the accuracy of the model’s output, identifying any fabricated or incorrect information.faithfulness: Evaluates how well the model’s output adheres to the provided context or source material.relevancy: Assesses the relevance of the model’s response to the input query or prompt.bias: Detects any unfair or discriminatory tendencies in the model’s predictions.context-relevancy: Determines how relevant the input or output is in relation to the given context.context-precision: Measures the precision of the model’s output within the specific context provided.toxicity: Identifies harmful or offensive language in the model’s responses.summarization: Evaluates the quality and accuracy of summaries generated by the model.pii: Detects the presence of personally identifiable information in the model’s output.
Metrics & Required Fields
| Metric | Payload |
|---|---|
hallucination | input, output, context |
faithfulness | input, output, retrieval_context |
relevancy | input, output |
bias | input, output |
context-relevancy | input, output, retrieval_context |
context-precision | input, output, ground_truth, retrieval_context |
toxicity | input, output |
summarization | input, output |
pii | output use murnitur-shield to detect from the input |