cfglm
Fast computation of the posterior distrubtion over the next word in a WCFG language model.
BoolCFGLM
Bases: LM
Language model interface for Boolean-weighted CFGs.
Uses Earley's algorithm or CKY for inference. The grammar is converted to use Boolean weights if needed, where positive weights become True and zero/negative weights become False.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
CFG
|
The context-free grammar to use |
required |
alg
|
str
|
Parsing algorithm to use - either 'earley' or 'cky' |
'earley'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If alg is not 'earley' or 'cky' |
Source code in genlm/grammar/cfglm.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
__call__(context)
Check if a context is possible under this grammar.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
sequence
|
The context to check |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the context has non-zero weight |
Source code in genlm/grammar/cfglm.py
__init__(cfg, alg='earley')
Initialize a BoolCFGLM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
CFG
|
The context-free grammar to use as the language model |
required |
alg
|
str
|
Parsing algorithm to use - either 'earley' or 'cky' |
'earley'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If alg is not 'earley' or 'cky' |
Source code in genlm/grammar/cfglm.py
clear_cache()
from_string(x, semiring=Boolean, **kwargs)
classmethod
Create a BoolCFGLM from a string representation of a grammar.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
str
|
The grammar string |
required |
semiring
|
The semiring for weights (default: Boolean) |
Boolean
|
|
**kwargs
|
Additional arguments passed to init |
{}
|
Returns:
| Type | Description |
|---|---|
BoolCFGLM
|
A new language model |
Source code in genlm/grammar/cfglm.py
p_next(context)
Compute next token probabilities given a context.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
sequence
|
The conditioning context |
required |
Returns:
| Type | Description |
|---|---|
chart
|
The next token weights |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If context contains out-of-vocabulary tokens |
Source code in genlm/grammar/cfglm.py
add_EOS(cfg, eos=None)
Add an end-of-sequence symbol to a CFG's language.
Transforms the grammar to append the EOS symbol to every string it generates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
CFG
|
The input grammar |
required |
eos
|
optional
|
The end-of-sequence symbol to add. Defaults to ▪. |
None
|
Returns:
| Type | Description |
|---|---|
CFG
|
A new grammar that generates strings ending in EOS |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If EOS is already in the grammar's vocabulary |
Source code in genlm/grammar/cfglm.py
locally_normalize(self, **kwargs)
Locally normalize the grammar's rule weights.
Returns a transformed grammar where: 1. The total weight of rules with the same head symbol sums to one 2. Each derivation's weight is proportional to the original grammar (differs only by a multiplicative normalization constant)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Additional arguments passed to self.agenda() |
{}
|
Returns:
| Type | Description |
|---|---|
CFG
|
A new grammar with locally normalized weights |