module ScopedSearch::QueryLanguage::Tokenizer
The Tokenizer
module adds methods to the query language compiler that transforms a query string into a stream of tokens, which are more appropriate for parsing a query string.
Constants
- KEYWORDS
-
All keywords that the language supports
- OPERATORS
-
Every operator the language supports.
Public Instance Methods
Source
# File lib/scoped_search/query_language/tokenizer.rb 19 def current_char 20 @current_char 21 end
Returns the current character of the string
Source
# File lib/scoped_search/query_language/tokenizer.rb 37 def each_token(&block) 38 while next_char 39 case current_char 40 when /^\s?$/; # ignore 41 when '('; yield(:lparen) 42 when ')'; yield(:rparen) 43 when ','; yield(:comma) 44 when /\&|\||=|<|>|\^|!|~|-/; tokenize_operator(&block) 45 when '"'; tokenize_quoted_keyword(&block) 46 else; tokenize_keyword(&block) 47 end 48 end 49 end
Tokenizes the string by iterating over the characters.
Source
# File lib/scoped_search/query_language/tokenizer.rb 31 def next_char 32 @current_char_pos += 1 33 @current_char = @str[@current_char_pos, 1] 34 end
Returns the next character of the string, and moves the position pointer one step forward
Source
# File lib/scoped_search/query_language/tokenizer.rb 25 def peek_char(amount = 1) 26 @str[@current_char_pos + amount, 1] 27 end
Returns a following character of the string (by default, the next character), without updating the position pointer.
Source
# File lib/scoped_search/query_language/tokenizer.rb 13 def tokenize 14 @current_char_pos = -1 15 to_a 16 end
Tokenizes the string and returns the result as an array of tokens.
Source
# File lib/scoped_search/query_language/tokenizer.rb 63 def tokenize_keyword(&block) 64 keyword = current_char 65 keyword << next_char while /[^=~<>\s\&\|\)\(,]/ =~ peek_char 66 KEYWORDS.has_key?(keyword.downcase) ? yield(KEYWORDS[keyword.downcase]) : yield(keyword) 67 end
Tokenizes a keyword, and converts it to a Symbol if it is recognized as a reserved language keyword (the KEYWORDS
array).
Source
# File lib/scoped_search/query_language/tokenizer.rb 55 def tokenize_operator(&block) 56 operator = current_char 57 operator << next_char.to_s if OPERATORS.has_key?(operator + peek_char.to_s) 58 yield(OPERATORS[operator]) 59 end
Tokenizes an operator that occurs in the OPERATORS
hash The .to_s on [peek|next]_char is to prevent a ruby bug when nil values are returned from strings which have forced encoding. github.com/wvanbergen/scoped_search/issues/33 for details
Source
# File lib/scoped_search/query_language/tokenizer.rb 71 def tokenize_quoted_keyword(&block) 72 keyword = "" 73 until next_char.nil? || current_char == '"' 74 keyword << (current_char == "\\" ? next_char : current_char) 75 end 76 yield(keyword) 77 end
Tokenizes a keyword that is quoted using double quotes. Allows escaping of double quote characters by backslashes.