GenLM Grammar Documentation
This is a Python library for working with weighted context-free grammars (WCFGs) and finite state machines (FSAs). It provides implementations of various parsing algorithms and language model capabilities.
Core Components
Grammar Types
- CFG: Context-free grammar implementation with support for:
- Grammar normalization and transformation
- Conversion to a character-level grammar
Language Models
- LM: Base language model class
- BoolCFGLM: Boolean-weighted CFG language model using Earley or CKY parsing
- CKYLM: CKY-based parsing for weighted CFGs
- EarleyLM: Earley-based parsing implementation for weighted CFGs
Parsing Algorithms
- Earley Parser: Earley parsing algorithm with rescaling for numerical stability
- IncrementalCKY: Incremental version of CKY with chart caching
Finite State Machines
- FST: Weighted finite-state transducer implementation
- WFSA: Weighted finite-state automaton base class
Mathematical Components
- Semiring: Abstract semiring implementations including:
- Boolean
- Float
- Log
- Expectation
- Chart: Weighted chart data structure with semiring operations
- WeightedGraph: Graph implementation for solving algebraic path problems
Utilities
- LarkStuff: Interface for converting Lark grammars to genlm-cfg format
- format_table: Utility functions for formatting and displaying tables
Key Features
- Support for various weighted grammar formalisms
- Multiple parsing algorithm implementations
- Efficient chart caching and incremental parsing
- Composition operations between FSTs and CFGs
- Semiring abstractions for different weight types
- Visualization capabilities for debugging and analysis
Common Operations
Creating a Grammar
from genlm.grammar.cfg import CFG
from genlm.grammar.semiring import Float
# Create from string representation
cfg = CFG.from_string(grammar_string, semiring=Float)