Example
Knowledge Retention Evaluation
To evaluate how well your model retains information across interactions, use Murnitur’sknowledge_retention
method. You can also set up a base threshold to define acceptable retention levels.
-
Import the Required Module:
-
Prepare Your Dataset:
-
Set Up a Base Threshold (Optional):
-
Run the Evaluation:
-
Sample Output:
Creating dataset
Side Note: If you are creating a dataset from a dictionary, use the following data structure:Loading dataset
Use any of the following functions to load the dataset:Extra Params
Extra parameters that can be passed torun_suite
save_output
: A boolean. Set to false if you do not want to store the results in Murnitur AI.async_mode
: Indicates whether the requests should run asynchronously.
Evaluation Metrics
We currently support the following evaluation metrics:hallucination
: Measures the accuracy of the model’s output, identifying any fabricated or incorrect information.faithfulness
: Evaluates how well the model’s output adheres to the provided context or source material.relevancy
: Assesses the relevance of the model’s response to the input query or prompt.bias
: Detects any unfair or discriminatory tendencies in the model’s predictions.context-relevancy
: Determines how relevant the input or output is in relation to the given context.context-precision
: Measures the precision of the model’s output within the specific context provided.toxicity
: Identifies harmful or offensive language in the model’s responses.summarization
: Evaluates the quality and accuracy of summaries generated by the model.pii
: Detects the presence of personally identifiable information in the model’s output.
Metrics & Required Fields
Metric | Payload |
---|---|
hallucination | input , output , context |
faithfulness | input , output , retrieval_context |
relevancy | input , output |
bias | input , output |
context-relevancy | input , output , retrieval_context |
context-precision | input , output , ground_truth , retrieval_context |
toxicity | input , output |
summarization | input , output |
pii | output use murnitur-shield to detect from the input |