Class CharsetRecog_mbcs.CharsetRecog_gb_18030

  • Enclosing class:
    CharsetRecog_mbcs

    static class CharsetRecog_mbcs.CharsetRecog_gb_18030
    extends CharsetRecog_mbcs
    GB-18030 recognizer. Uses simplified Chinese statistics.
    • Field Detail

      • commonChars

        static int[] commonChars
    • Constructor Detail

      • CharsetRecog_gb_18030

        CharsetRecog_gb_18030()
    • Method Detail

      • nextChar

        boolean nextChar​(CharsetRecog_mbcs.iteratedChar it,
                         CharsetDetector det)
        Description copied from class: CharsetRecog_mbcs
        Get the next character (however many bytes it is) from the input data Subclasses for specific charset encodings must implement this function to get characters according to the rules of their encoding scheme. This function is not a method of class iteratedChar only because that would require a lot of extra derived classes, which is awkward.
        Specified by:
        nextChar in class CharsetRecog_mbcs
        Parameters:
        it - The iteratedChar "struct" into which the returned char is placed.
        det - The charset detector, which is needed to get at the input byte data being iterated over.
        Returns:
        True if a character was returned, false at end of input.
      • match

        CharsetMatch match​(CharsetDetector det)
        Description copied from class: CharsetRecognizer
        Test the match of this charset with the input text data which is obtained via the CharsetDetector object.
        Specified by:
        match in class CharsetRecognizer
        Parameters:
        det - The CharsetDetector, which contains the input text to be checked for being in this charset.
        Returns:
        A CharsetMatch object containing details of match with this charset, or null if there was no match.
      • getLanguage

        public java.lang.String getLanguage()
        Description copied from class: CharsetRecognizer
        Get the ISO language code for this charset.
        Overrides:
        getLanguage in class CharsetRecognizer
        Returns:
        the language code, or null if the language cannot be determined.