class PrattParser
A Pratt parser. Similar to a recursive descent parser but instead of coding a function for each production, the syntax is coded in a set of token objects that are yielded by the lexer. New operators and statements can be slipped into the language with the proper precedence by adding new token objects to the lexer without altering the code for existing tokens. Pretty cool.
lexer is must have an #each
method that returns token objects. A token has three methods:
lbp
-
Returns the operator precedence. Higher numbers bind more tightly.
nud(parser)
-
Called when the token is the first token in an expression, including a recursive call to
expresssion
(i.e., subexpression). For example,nud
would be called for a unary operator, a literal, or for the “if” in the construct “if <cond> then <expr>”. It is the token’s responsibility to callparser.expression
,parser.expect
, and/orparser.if?
to handle the remainder of the (sub)expression, if any. led(parser, left)
-
Called when the token is preceeded by a subexpression, passed in as
left
. The token may be postfix or infix. It is the token’s responsibility to callparser.expression
,parser.expect
, and/orparser.if?
to handle the remainder of the expression, if any, and combine it withleft
.
Only lbp
is mandatory. nud
and led
will be called only when necessary, if ever. For example, nud
will never be called for a strictly infix token. If the token appears at the start of a (sub)expression then an exception that isn’t at al appropriate to the abstraction will be thrown.
nud
and led
can call parser.expression(rbp)
to recursively parse the right subexpression. rbp
should be the token’s lbp
for left-associativity, lbp-1
for right.
PrattParser.new(lexer).eval
will return the result of the parse.
Syntax errors aren’t handled at the moment and will cause ridiculous exceptions to be raised such as NoMethodError
.
Further reading:
Public Class Methods
Creates a new PrattParser
. lexer
is an Enumerable
, or something with an #each
method.
# File lib/pratt_parser.rb, line 51 def initialize(lexer) @lexer = Enumerator.new do |y| lexer.each do |token| y << token end y << EndToken.new end @token = nil end
Public Instance Methods
Runs the tokens through the parse engine and returns the result, or throws some exception on parse error.
# File lib/pratt_parser.rb, line 65 def eval @token = @lexer.next expression(0) end
Checks whether the lookahead token is of the expected_token_class
and raises an exception if it isn’t. Alternatively a block may be given; the block is passed the lookahead token and should raise an exception if it’s not an expected token. In either case if no exception is raised then the lookahead token is consumed.
expect
can be used to match the right parenthesis in a parenthesized expression, the colon in a “cond ? then : else” expression, etc.
# File lib/pratt_parser.rb, line 97 def expect(expected_token_class = nil, &block) block ||= lambda do |token| if token.class != expected_token_class raise "Expected #{expected_token_class}, got #{token.class}" end end block.call(@token) @token = @lexer.next end
For use by token #led
methods to parse subexpressions with binding power less than rbp
. Whatever that means. Don’t worry, it just does the Right Thing.
# File lib/pratt_parser.rb, line 74 def expression(rbp) t = @token @token = @lexer.next left = t.nud(self) while rbp < @token.lbp t = @token @token = @lexer.next left = t.led(self, left) end left end
Checks whether the lookahead token is of the token_class
. If it is, consumes the lookahead token and returns truthy, else just returns falsy. Alternatively a block can be given which is passed the token and should return truthy or falssy.
if?
can be used for optional tokens such as the else
clause in “if cond then val1 [else val2] end”.
# File lib/pratt_parser.rb, line 116 def if?(token_class, &block) block ||= lambda do |token| token.class == token_class end if block.call(@token) @token = @lexer.next end end