Class CharsetRecognizer

  • Direct Known Subclasses:
    CharsetRecog_2022, CharsetRecog_mbcs, CharsetRecog_sbcs, CharsetRecog_Unicode, CharsetRecog_UTF8

    abstract class CharsetRecognizer
    extends java.lang.Object
    Abstract class for recognizing a single charset. Part of the implementation of ICU's CharsetDetector. Each specific charset that can be recognized will have an instance of some subclass of this class. All interaction between the overall CharsetDetector and the stuff specific to an individual charset happens via the interface provided here. Instances of CharsetDetector DO NOT have or maintain state pertaining to a specific match or detect operation. The WILL be shared by multiple instances of CharsetDetector. They encapsulate const charset-specific information.
    • Method Summary

      All Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String getLanguage()
      Get the ISO language code for this charset.
      (package private) abstract java.lang.String getName()
      Get the IANA name of this charset.
      (package private) abstract CharsetMatch match​(CharsetDetector det)
      Test the match of this charset with the input text data which is obtained via the CharsetDetector object.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • CharsetRecognizer

        CharsetRecognizer()
    • Method Detail

      • getName

        abstract java.lang.String getName()
        Get the IANA name of this charset.
        Returns:
        the charset name.
      • getLanguage

        public java.lang.String getLanguage()
        Get the ISO language code for this charset.
        Returns:
        the language code, or null if the language cannot be determined.
      • match

        abstract CharsetMatch match​(CharsetDetector det)
        Test the match of this charset with the input text data which is obtained via the CharsetDetector object.
        Parameters:
        det - The CharsetDetector, which contains the input text to be checked for being in this charset.
        Returns:
        A CharsetMatch object containing details of match with this charset, or null if there was no match.