util
LazyWeights
A class to represent weights in a lazy manner, allowing for efficient operations on potentially large weight arrays without immediate materialization.
Attributes:
Name | Type | Description |
---|---|---|
weights |
ndarray
|
The weights associated with the tokens. |
encode |
dict
|
A mapping from tokens to their corresponding indices in the weights array. |
decode |
list
|
A list of tokens corresponding to the weights. |
is_log |
bool
|
A flag indicating whether the weights are in log space. |
Source code in genlm/control/util.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 |
|
__init__(weights, encode, decode, log=True)
Initialize the LazyWeights instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
weights
|
ndarray
|
The weights associated with the tokens. |
required |
encode
|
dict
|
A mapping from tokens to their corresponding indices in the weights array. |
required |
decode
|
list
|
A list of tokens corresponding to the weights. |
required |
log
|
bool
|
Indicates if the weights are in log space. Defaults to True. |
True
|
Raises:
Type | Description |
---|---|
AssertionError
|
If the lengths of weights and decode or encode do not match. |
Source code in genlm/control/util.py
__getitem__(token)
Retrieve the weight for a given token.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
token
|
Any
|
The token for which to retrieve the weight. |
required |
Returns:
Type | Description |
---|---|
float
|
The weight of the token, or -inf/0 if the token is not found. |
Source code in genlm/control/util.py
keys()
values()
items()
normalize()
Normalize the weights.
Normalization is performed using log-space arithmetic when weights are logarithmic, or standard arithmetic otherwise.
Returns:
Type | Description |
---|---|
LazyWeights
|
A new LazyWeights instance with normalized weights. |
Source code in genlm/control/util.py
exp()
Exponentiate the weights. This operation can only be performed when weights are in log space.
Returns:
Type | Description |
---|---|
LazyWeights
|
A new LazyWeights instance with exponentiated weights. |
Raises:
Type | Description |
---|---|
AssertionError
|
If the weights are not in log space. |
Source code in genlm/control/util.py
log()
Take the logarithm of the weights. This operation can only be performed when weights are in regular space.
Returns:
Type | Description |
---|---|
LazyWeights
|
A new LazyWeights instance with logarithmic weights. |
Raises:
Type | Description |
---|---|
AssertionError
|
If the weights are already in log space. |
Source code in genlm/control/util.py
sum()
Sum the weights.
Summation is performed using log-space arithmetic when weights are logarithmic, or standard arithmetic otherwise.
Returns:
Type | Description |
---|---|
float
|
The sum of the weights, either in log space or regular space. |
Source code in genlm/control/util.py
spawn(new_weights, log=None)
Create a new LazyWeights instance over the same vocabulary with new weights.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
new_weights
|
ndarray
|
The new weights for the LazyWeights instance. |
required |
log
|
bool
|
Indicates if the new weights are in log space. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
LazyWeights
|
A new LazyWeights instance. |
Source code in genlm/control/util.py
materialize(top=None)
Materialize the weights into a chart.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
top
|
int
|
The number of top weights to materialize. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
Chart
|
A chart representation of the weights. |
Source code in genlm/control/util.py
assert_equal(other, **kwargs)
Assert that two LazyWeights instances are equal.
This method asserts that the two LazyWeights instances have the same vocabulary (in identical order) and that their weights are numerically close.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
LazyWeights
|
The other LazyWeights instance to compare. |
required |
**kwargs
|
dict
|
Additional arguments for np.testing.assert_allclose (e.g., rtol, atol). |
{}
|
Source code in genlm/control/util.py
assert_equal_unordered(other, **kwargs)
Assert that two LazyWeights instances are equal, ignoring vocabularyorder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
LazyWeights
|
The other LazyWeights instance to compare. |
required |
**kwargs
|
dict
|
Additional arguments for np.isclose (e.g., rtol, atol). |
{}
|
Source code in genlm/control/util.py
load_trie(V, backend=None, **kwargs)
Load a TokenCharacterTrie.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
V
|
list
|
The vocabulary. |
required |
backend
|
str
|
The backend to use for trie construction. Defaults to None. |
None
|
**kwargs
|
dict
|
Additional arguments for the trie construction. |
{}
|
Returns:
Type | Description |
---|---|
TokenCharacterTrie
|
A trie instance. |
Source code in genlm/control/util.py
load_async_trie(V, backend=None, **kwargs)
Load an AsyncTokenCharacterTrie. This is a TokenCharacterTrie that automatically batches weight_sum and weight_max requests.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
V
|
list
|
The vocabulary. |
required |
backend
|
str
|
The backend to use for trie construction. Defaults to None. |
None
|
**kwargs
|
dict
|
Additional arguments for the trie construction. |
{}
|
Returns:
Type | Description |
---|---|
AsyncTokenCharacterTrie
|
An async trie instance. |
Source code in genlm/control/util.py
fast_sample_logprobs(logprobs, size=1)
Sample indices from an array of log probabilities using the Gumbel-max trick.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
logprobs
|
ndarray
|
Array of log probabilities |
required |
size
|
int
|
Number of samples to draw |
1
|
Returns:
Type | Description |
---|---|
ndarray
|
Array of sampled indices |
Note
This is much faster than np.random.choice for large arrays since it avoids normalizing probabilities and uses vectorized operations.
Source code in genlm/control/util.py
fast_sample_lazyweights(lazyweights)
Sample a token from a LazyWeights instance using the Gumbel-max trick.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lazyweights
|
LazyWeights
|
A LazyWeights instance |
required |
Returns:
Type | Description |
---|---|
Any
|
Sampled token |