util
LazyWeights
A class to represent weights in a lazy manner, allowing for efficient operations on potentially large weight arrays without immediate materialization.
Attributes:
Name | Type | Description |
---|---|---|
weights |
ndarray
|
The weights associated with the tokens. |
encode |
dict
|
A mapping from tokens to their corresponding indices in the weights array. |
decode |
list
|
A list of tokens corresponding to the weights. |
is_log |
bool
|
A flag indicating whether the weights are in log space. |
Source code in genlm/control/util.py
|
|
__init__(weights, encode, decode, log=True)
Initialize the LazyWeights instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
weights
|
ndarray
|
The weights associated with the tokens. |
required |
encode
|
dict
|
A mapping from tokens to their corresponding indices in the weights array. |
required |
decode
|
list
|
A list of tokens corresponding to the weights. |
required |
log
|
bool
|
Indicates if the weights are in log space. Defaults to True. |
True
|
Raises:
Type | Description |
---|---|
AssertionError
|
If the lengths of weights and decode or encode do not match. |
Source code in genlm/control/util.py
__getitem__(token)
Retrieve the weight for a given token.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
token
|
Any
|
The token for which to retrieve the weight. |
required |
Returns:
Type | Description |
---|---|
float
|
The weight of the token, or -inf/0 if the token is not found. |
Source code in genlm/control/util.py
keys()
values()
items()
normalize()
Normalize the weights.
Normalization is performed using log-space arithmetic when weights are logarithmic, or standard arithmetic otherwise.
Returns:
Type | Description |
---|---|
LazyWeights
|
A new LazyWeights instance with normalized weights. |
Source code in genlm/control/util.py
exp()
Exponentiate the weights. This operation can only be performed when weights are in log space.
Returns:
Type | Description |
---|---|
LazyWeights
|
A new LazyWeights instance with exponentiated weights. |
Raises:
Type | Description |
---|---|
AssertionError
|
If the weights are not in log space. |
Source code in genlm/control/util.py
log()
Take the logarithm of the weights. This operation can only be performed when weights are in regular space.
Returns:
Type | Description |
---|---|
LazyWeights
|
A new LazyWeights instance with logarithmic weights. |
Raises:
Type | Description |
---|---|
AssertionError
|
If the weights are already in log space. |
Source code in genlm/control/util.py
sum()
Sum the weights.
Summation is performed using log-space arithmetic when weights are logarithmic, or standard arithmetic otherwise.
Returns:
Type | Description |
---|---|
float
|
The sum of the weights, either in log space or regular space. |
Source code in genlm/control/util.py
spawn(new_weights, log=None)
Create a new LazyWeights instance over the same vocabulary with new weights.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
new_weights
|
ndarray
|
The new weights for the LazyWeights instance. |
required |
log
|
bool
|
Indicates if the new weights are in log space. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
LazyWeights
|
A new LazyWeights instance. |
Source code in genlm/control/util.py
materialize(top=None)
Materialize the weights into a chart.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
top
|
int
|
The number of top weights to materialize. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
Chart
|
A chart representation of the weights. |
Source code in genlm/control/util.py
assert_equal(other, **kwargs)
Assert that two LazyWeights instances are equal.
This method asserts that the two LazyWeights instances have the same vocabulary (in identical order) and that their weights are numerically close.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
LazyWeights
|
The other LazyWeights instance to compare. |
required |
**kwargs
|
dict
|
Additional arguments for np.testing.assert_allclose (e.g., rtol, atol). |
{}
|
Source code in genlm/control/util.py
assert_equal_unordered(other, **kwargs)
Assert that two LazyWeights instances are equal, ignoring vocabularyorder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
LazyWeights
|
The other LazyWeights instance to compare. |
required |
**kwargs
|
dict
|
Additional arguments for np.isclose (e.g., rtol, atol). |
{}
|
Source code in genlm/control/util.py
load_trie(V, backend=None, **kwargs)
Load a TokenCharacterTrie.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
V
|
list
|
The vocabulary. |
required |
backend
|
str
|
The backend to use for trie construction. Defaults to None. |
None
|
**kwargs
|
dict
|
Additional arguments for the trie construction. |
{}
|
Returns:
Type | Description |
---|---|
TokenCharacterTrie
|
A trie instance. |
Source code in genlm/control/util.py
load_async_trie(V, backend=None, **kwargs)
Load an AsyncTokenCharacterTrie. This is a TokenCharacterTrie that automatically batches weight_sum and weight_max requests.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
V
|
list
|
The vocabulary. |
required |
backend
|
str
|
The backend to use for trie construction. Defaults to None. |
None
|
**kwargs
|
dict
|
Additional arguments for the trie construction. |
{}
|
Returns:
Type | Description |
---|---|
AsyncTokenCharacterTrie
|
An async trie instance. |
Source code in genlm/control/util.py
fast_sample_logprobs(logprobs, size=1)
Sample indices from an array of log probabilities using the Gumbel-max trick.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
logprobs
|
ndarray
|
Array of log probabilities |
required |
size
|
int
|
Number of samples to draw |
1
|
Returns:
Type | Description |
---|---|
ndarray
|
Array of sampled indices |
Note
This is much faster than np.random.choice for large arrays since it avoids normalizing probabilities and uses vectorized operations.
Source code in genlm/control/util.py
fast_sample_lazyweights(lazyweights)
Sample a token from a LazyWeights instance using the Gumbel-max trick.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lazyweights
|
LazyWeights
|
A LazyWeights instance |
required |
Returns:
Type | Description |
---|---|
Any
|
Sampled token |