Class UnicodeDecompressor

  • All Implemented Interfaces:
    SCSU

    public final class UnicodeDecompressor
    extends java.lang.Object
    implements SCSU
    A decompression engine implementing the Standard Compression Scheme for Unicode (SCSU) as outlined in Unicode Technical Report #6.

    USAGE

    The static methods on UnicodeDecompressor may be used in a straightforward manner to decompress simple strings:

      byte [] compressed = ... ; // get compressed bytes from somewhere
      String result = UnicodeDecompressor.decompress(compressed);
     

    The static methods have a fairly large memory footprint. For finer-grained control over memory usage, UnicodeDecompressor offers more powerful APIs allowing iterative decompression:

      // Decompress an array "bytes" of length "len" using a buffer of 512 chars
      // to the Writer "out"
    
      UnicodeDecompressor myDecompressor         = new UnicodeDecompressor();
      final static int    BUFSIZE                = 512;
      char []             charBuffer             = new char [ BUFSIZE ];
      int                 charsWritten           = 0;
      int []              bytesRead              = new int [1];
      int                 totalBytesDecompressed = 0;
      int                 totalCharsWritten      = 0;
    
      do {
        // do the decompression
        charsWritten = myDecompressor.decompress(bytes, totalBytesDecompressed, 
                                                 len, bytesRead,
                                                 charBuffer, 0, BUFSIZE);
    
        // do something with the current set of chars
        out.write(charBuffer, 0, charsWritten);
    
        // update the no. of bytes decompressed
        totalBytesDecompressed += bytesRead[0];
    
        // update the no. of chars written
        totalCharsWritten += charsWritten;
    
      } while(totalBytesDecompressed < len);
    
      myDecompressor.reset(); // reuse decompressor
     

    Decompression is performed according to the standard set forth in Unicode Technical Report #6

    See Also:
    UnicodeCompressor
    • Field Detail

      • fCurrentWindow

        private int fCurrentWindow
        Alias to current dynamic window
      • fOffsets

        private int[] fOffsets
        Dynamic compression window offsets
      • fMode

        private int fMode
        Current compression mode
      • fBuffer

        private byte[] fBuffer
        Internal buffer for saving state
      • fBufferLength

        private int fBufferLength
        Number of characters in our internal buffer
    • Constructor Detail

      • UnicodeDecompressor

        public UnicodeDecompressor()
        Create a UnicodeDecompressor. Sets all windows to their default values.
        See Also:
        reset()
    • Method Detail

      • decompress

        public static java.lang.String decompress​(byte[] buffer)
        Decompress a byte array into a String.
        Parameters:
        buffer - The byte array to decompress.
        Returns:
        A String containing the decompressed characters.
        See Also:
        decompress(byte [], int, int)
      • decompress

        public static char[] decompress​(byte[] buffer,
                                        int start,
                                        int limit)
        Decompress a byte array into a Unicode character array.
        Parameters:
        buffer - The byte array to decompress.
        start - The start of the byte run to decompress.
        limit - The limit of the byte run to decompress.
        Returns:
        A character array containing the decompressed bytes.
        See Also:
        decompress(byte [])
      • decompress

        public int decompress​(byte[] byteBuffer,
                              int byteBufferStart,
                              int byteBufferLimit,
                              int[] bytesRead,
                              char[] charBuffer,
                              int charBufferStart,
                              int charBufferLimit)
        Decompress a byte array into a Unicode character array. This function will either completely fill the output buffer, or consume the entire input.
        Parameters:
        byteBuffer - The byte buffer to decompress.
        byteBufferStart - The start of the byte run to decompress.
        byteBufferLimit - The limit of the byte run to decompress.
        bytesRead - A one-element array. If not null, on return the number of bytes read from byteBuffer.
        charBuffer - A buffer to receive the decompressed data. This buffer must be at minimum two characters in size.
        charBufferStart - The starting offset to which to write decompressed data.
        charBufferLimit - The limiting offset for writing decompressed data.
        Returns:
        The number of Unicode characters written to charBuffer.
      • reset

        public void reset()
        Reset the decompressor to its initial state.