goal_inference
genlm.eval.domains.goal_inference
GoalInferenceInstance
Bases: Instance
Schema for a single Planetarium goal-inference item.
Source code in genlm/eval/domains/goal_inference/goal_inference.py
GoalInferenceDataset
Bases: Dataset[GoalInferenceInstance]
Dataset wrapper yielding GoalInferenceInstance items.
Source code in genlm/eval/domains/goal_inference/goal_inference.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
__init__(dev_items)
__iter__()
Yield GoalInferenceInstance objects built from stored records.
Source code in genlm/eval/domains/goal_inference/goal_inference.py
from_hf_planetarium(n_examples=100, max_objects=9, shard_filename='data/train-00000-of-00001.parquet', domains=None)
classmethod
Load and filter Planetarium data via HuggingFace.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_examples
|
int
|
Number of instances to evaluate. |
100
|
max_objects
|
int
|
Keep problems with at most this many objects. |
9
|
shard_filename
|
str
|
Specific shard file to download from Planetarium. |
'data/train-00000-of-00001.parquet'
|
domains
|
Optional[List[str]]
|
Optional list of domain names to include. |
None
|
Returns:
| Type | Description |
|---|---|
GoalInferenceDataset
|
GoalInferenceDataset with filtered instances. |
Source code in genlm/eval/domains/goal_inference/goal_inference.py
GoalInferenceEvaluator
Bases: Evaluator[GoalInferenceInstance]
Evaluator using Planetarium equivalence on masked PDDL reconstruction.
Source code in genlm/eval/domains/goal_inference/goal_inference.py
evaluate_sample(instance, response)
Inject prediction into masked PDDL and check equivalence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
instance
|
GoalInferenceInstance
|
The goal-inference item being evaluated. |
required |
response
|
str
|
Model output to splice into the goal (no closing paren). |
required |
Returns:
| Type | Description |
|---|---|
EvaluationResult
|
EvaluationResult with score 1.0 if equivalent, else 0.0. |
Source code in genlm/eval/domains/goal_inference/goal_inference.py
goal_default_prompt_formatter(tokenizer, instance, use_chat_format=False, system_prompt=GOAL_SYSTEM_PROMPT)
Format the prompt to reproduce the reference assistant-prefix prompting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokenizer
|
Tokenizer
|
The tokenizer to use. |
required |
instance
|
GoalInferenceInstance
|
The instance to format. |
required |
use_chat_format
|
bool
|
Whether to use chat format. |
False
|
system_prompt
|
str
|
The system prompt to use. |
GOAL_SYSTEM_PROMPT
|
Returns:
| Type | Description |
|---|---|
list[int]
|
The prompt ids. |
Source code in genlm/eval/domains/goal_inference/goal_inference.py
GoalInferenceVALPotential
Bases: Potential
Expensive potential that validates partial goal strings with VAL.
It splices the current candidate goal into the problem, reuses a (cached) Fast-Downward plan for (domain, problem), and returns 0.0 if VAL validates the plan under the candidate goal (-inf otherwise).
Source code in genlm/eval/domains/goal_inference/goal_potential.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
prefix(context)
async
Score a partial prefix during generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
bytes
|
Byte prefix generated so far. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Potential value for the prefix (0.0 / -inf). |
Source code in genlm/eval/domains/goal_inference/goal_potential.py
complete(context)
async
Score a completed sequence at EOS.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
bytes
|
Final byte sequence (without our trailing ')' heuristic). |
required |
Returns:
| Type | Description |
|---|---|
float
|
Potential value for the complete string (0.0 / -inf). |