Class PatternTokenizer


  • public class PatternTokenizer
    extends java.lang.Object
    A simple parsing class for patterns and rules. Handles '...' quotations, \\uxxxx and \\Uxxxxxxxx, and symple syntax. The '' (two quotes) is treated as a single quote, inside or outside a quote
    • Any ignorable characters are ignored in parsing.
    • Any syntax characters are broken into separate tokens
    • Quote characters can be specified: '...', "...", and \x
    • Other characters are treated as literals
    • Constructor Detail

      • PatternTokenizer

        public PatternTokenizer()
    • Method Detail

      • getIgnorableCharacters

        public UnicodeSet getIgnorableCharacters()
      • setIgnorableCharacters

        public PatternTokenizer setIgnorableCharacters​(UnicodeSet ignorableCharacters)
        Sets the characters to be ignored in parsing, eg new UnicodeSet("[:pattern_whitespace:]");
        Parameters:
        ignorableCharacters - Characters to be ignored.
        Returns:
        A PatternTokenizer object in which characters are specified as ignored characters.
      • getSyntaxCharacters

        public UnicodeSet getSyntaxCharacters()
      • getExtraQuotingCharacters

        public UnicodeSet getExtraQuotingCharacters()
      • setSyntaxCharacters

        public PatternTokenizer setSyntaxCharacters​(UnicodeSet syntaxCharacters)
        Sets the characters to be interpreted as syntax characters in parsing, eg new UnicodeSet("[:pattern_syntax:]")
        Parameters:
        syntaxCharacters - Characters to be set as syntax characters.
        Returns:
        A PatternTokenizer object in which characters are specified as syntax characters.
      • setExtraQuotingCharacters

        public PatternTokenizer setExtraQuotingCharacters​(UnicodeSet syntaxCharacters)
        Sets the extra characters to be quoted in literals
        Parameters:
        syntaxCharacters - Characters to be set as extra quoting characters.
        Returns:
        A PatternTokenizer object in which characters are specified as extra quoting characters.
      • getEscapeCharacters

        public UnicodeSet getEscapeCharacters()
      • setEscapeCharacters

        public PatternTokenizer setEscapeCharacters​(UnicodeSet escapeCharacters)
        Set characters to be escaped in literals, in quoteLiteral and normalize, eg new UnicodeSet("[^\\u0020-\\u007E]");
        Parameters:
        escapeCharacters - Characters to be set as escape characters.
        Returns:
        A PatternTokenizer object in which characters are specified as escape characters.
      • isUsingQuote

        public boolean isUsingQuote()
      • isUsingSlash

        public boolean isUsingSlash()
      • getLimit

        public int getLimit()
      • getStart

        public int getStart()
      • setPattern

        public PatternTokenizer setPattern​(java.lang.CharSequence pattern)
      • quoteLiteral

        public java.lang.String quoteLiteral​(java.lang.CharSequence string)
      • quoteLiteral

        public java.lang.String quoteLiteral​(java.lang.String string)
        Quote a literal string, using the available settings. Thus syntax characters, quote characters, and ignorable characters will be put into quotes.
        Parameters:
        string - String passed to quote a literal string.
        Returns:
        A string using the available settings will place syntax, quote, or ignorable characters into quotes.
      • appendEscaped

        private void appendEscaped​(java.lang.StringBuffer result,
                                   int cp)
      • normalize

        public java.lang.String normalize()
      • next

        public int next​(java.lang.StringBuffer buffer)