molecular_synthesis
genlm.eval.domains.molecular_synthesis
MolecularSynthesisInstance
MolecularSynthesisDataset
Bases: Dataset[MolecularSynthesisInstance]
Dataset for molecular synthesis evaluation.
Source code in genlm/eval/domains/molecular_synthesis.py
__init__(prompt_molecules)
Initialize the dataset with a list of molecules.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prompt_molecules
|
List of lists of molecules which will be used to generate prompts. |
required |
Source code in genlm/eval/domains/molecular_synthesis.py
from_smiles(smiles_path, n_molecules=20, n_instances=100, seed=1234)
classmethod
Load molecules from a SMILES file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
smiles_path
|
str
|
Path to the .smi file containing SMILES strings. |
required |
n_molecules
|
int
|
Number of molecules to sample. |
20
|
n_instances
|
int
|
Number of instances to sample. |
100
|
seed
|
int
|
Seed for the random number generator. |
1234
|
Returns:
Name | Type | Description |
---|---|---|
MolecularSynthesisDataset |
Dataset initialized with molecules from the SMILES. |
Source code in genlm/eval/domains/molecular_synthesis.py
__iter__()
Iterate over molecules.
Returns:
Type | Description |
---|---|
Iterator[MolecularSynthesisInstance]: Iterator over molecular synthesis instances. |
Source code in genlm/eval/domains/molecular_synthesis.py
schema
property
Get the schema class for this dataset.
Returns:
Type | Description |
---|---|
type[MolecularSynthesisInstance]: The Pydantic model class for molecular synthesis instances. |
MolecularSynthesisEvaluator
Bases: Evaluator[MolecularSynthesisInstance]
Evaluator for molecular synthesis.
Source code in genlm/eval/domains/molecular_synthesis.py
evaluate_sample(instance, response)
Evaluate if a response matches the regex pattern.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
instance
|
PatternMatchingInstance
|
The pattern matching instance being evaluated. |
required |
response
|
str
|
The model's response text. |
required |
Returns:
Type | Description |
---|---|
bool
|
Whether the response matches the pattern. |
Source code in genlm/eval/domains/molecular_synthesis.py
default_prompt_formatter(tokenizer, instance, use_chat_format=False, system_prompt=SYSTEM_PROMPT)
Default prompt formatter for molecular synthesis.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tokenizer
|
Tokenizer
|
The tokenizer to use. |
required |
instance
|
MolecularSynthesisInstance
|
The instance to format. |
required |
use_chat_format
|
bool
|
Whether to use chat format. |
False
|
system_prompt
|
str
|
The system prompt to use. |
SYSTEM_PROMPT
|
Returns:
Type | Description |
---|---|
list[int]
|
The prompt ids. |