evaluator
genlm.eval.core.evaluator
EvaluationResult
Evaluator
Bases: Generic[T]
, ABC
Base class for evaluators that handle response evaluation.
Source code in genlm/eval/core/evaluator.py
evaluate_sample(instance, response)
abstractmethod
Evaluate a single response for correctness.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
instance
|
T
|
The dataset instance being evaluated. |
required |
response
|
Any
|
The model's response, which is given by the response attribute of a |
required |
Returns:
Type | Description |
---|---|
EvaluationResult
|
The evaluation result. |
Source code in genlm/eval/core/evaluator.py
evaluate_ensemble(instance, output)
Evaluate the complete ensemble of weighted samples using weighted accuracy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
instance
|
T
|
The dataset instance being evaluated. |
required |
output
|
ModelOutput
|
The complete model output including ensemble responses. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dictionary containing evaluation metrics. |