Class UnicodeBidiAlgorithm
- java.lang.Object
-
- org.apache.fop.complexscripts.bidi.UnicodeBidiAlgorithm
-
- All Implemented Interfaces:
BidiConstants
public final class UnicodeBidiAlgorithm extends java.lang.Object implements BidiConstants
The
UnicodeBidiAlgorithm
class implements functionality prescribed by the Unicode Bidirectional Algorithm, Unicode Standard Annex #9.This work was originally authored by Glenn Adams (gadams@apache.org).
-
-
Field Summary
Fields Modifier and Type Field Description private static org.apache.commons.logging.Log
log
logging instance
-
Constructor Summary
Constructors Modifier Constructor Description private
UnicodeBidiAlgorithm()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description private static int
convertToScalar(int chHi, int chLo)
Convert UTF-16 surrogate pair to unicode scalar valuee.private static boolean
convertToScalar(java.lang.CharSequence cs, int[] chars)
Convert character sequence (a UTF-16 encoded string) to an array of unicode scalar values expressed as integers.private static int[]
copySequence(int[] ta)
private static int
directionOfLevel(int level)
private static void
dump(java.lang.String header, int[] chars, int[] classes, int defaultLevel, int[] levels)
private static int
findNextNonRetainedFormattingLevel(int[] wca, int[] ea, int start, int lPrev)
private static int[]
getClasses(int[] chars)
private static java.lang.String
getClassName(int bc)
private static int
getLevelRunLength(int[] ea, int start)
private static int
getRetainedFormattingRunLength(int[] wca, int start)
private static boolean
isNeutral(int bc)
private static boolean
isRetainedFormatting(int bc)
private static boolean
isRetainedFormatting(int[] ca, int s, int e)
private static boolean
isStrong(int bc)
private static int
levelOfEmbedding(int embedding)
private static int[]
levelsFromEmbeddings(int[] ea, int[] la)
private static int
max(int x, int y)
private static java.lang.String
padLeft(int n, int width)
private static java.lang.String
padLeft(java.lang.String s, int width)
private static java.lang.String
padRight(java.lang.String s, int width)
private static void
resolveAdjacentBoundaryNeutrals(int[] wca, int start, int end, int index, int bcNew)
private static void
resolveExplicit(int[] wca, int defaultLevel, int[] ea)
private static void
resolveImplicit(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor)
static int[]
resolveLevels(int[] chars, int[] classes, int defaultLevel, int[] levels, boolean useRuleL1)
Resolve the directionality levels of each character in a character seqeunce.static int[]
resolveLevels(int[] chars, int defaultLevel, int[] levels)
Resolve the directionality levels of each character in a character seqeunce.static int[]
resolveLevels(java.lang.CharSequence cs, Direction defaultLevel)
Resolve the directionality levels of each character in a character seqeunce.private static void
resolveNeutrals(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor)
private static int
resolveRun(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int levelPrev)
private static void
resolveRuns(int[] wca, int defaultLevel, int[] ea, int[] la)
private static void
resolveSeparators(int[] ica, int[] wca, int dl, int[] la)
Resolve separators and boundary neutral levels to account for UAX#9 3.4 L1 while taking into account retention of formatting codes (5.2).private static void
resolveWeak(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor)
private static boolean
startsWithRetainedFormattingRun(int[] wca, int[] ea, int start)
private static boolean
triggersBidi(int ch)
Determine of character CH triggers bidirectional processing.
-
-
-
Method Detail
-
resolveLevels
public static int[] resolveLevels(java.lang.CharSequence cs, Direction defaultLevel)
Resolve the directionality levels of each character in a character seqeunce. If some character is encoded in the character sequence as a Unicode Surrogate Pair, then the directionality level of each of the two members of the pair will be identical.- Parameters:
cs
- input character sequence representing a UTF-16 encoded stringdefaultLevel
- the default paragraph level, which must be zero (LR) or one (RL)- Returns:
- null if bidirectional processing is not required; otherwise, returns an array of integers, where each integer corresponds to exactly one UTF-16 encoding element present in the input character sequence, and where each integer denotes the directionality level of the corresponding encoding element
-
resolveLevels
public static int[] resolveLevels(int[] chars, int defaultLevel, int[] levels)
Resolve the directionality levels of each character in a character seqeunce.- Parameters:
chars
- array of input characters represented as unicode scalar valuesdefaultLevel
- the default paragraph level, which must be zero (LR) or one (RL)levels
- array to receive levels, one for each character in chars array- Returns:
- null if bidirectional processing is not required; otherwise, returns an array of integers, where each integer corresponds to exactly one UTF-16 encoding element present in the input character sequence, and where each integer denotes the directionality level of the corresponding encoding element
-
resolveLevels
public static int[] resolveLevels(int[] chars, int[] classes, int defaultLevel, int[] levels, boolean useRuleL1)
Resolve the directionality levels of each character in a character seqeunce.- Parameters:
chars
- array of input characters represented as unicode scalar valuesclasses
- array containing one bidi class per character in chars arraydefaultLevel
- the default paragraph level, which must be zero (LR) or one (RL)levels
- array to receive levels, one for each character in chars arrayuseRuleL1
- true if rule L1 should be used- Returns:
- null if bidirectional processing is not required; otherwise, returns an array of integers, where each integer corresponds to exactly one UTF-16 encoding element present in the input character sequence, and where each integer denotes the directionality level of the corresponding encoding element
-
copySequence
private static int[] copySequence(int[] ta)
-
resolveExplicit
private static void resolveExplicit(int[] wca, int defaultLevel, int[] ea)
-
directionOfLevel
private static int directionOfLevel(int level)
-
levelOfEmbedding
private static int levelOfEmbedding(int embedding)
-
levelsFromEmbeddings
private static int[] levelsFromEmbeddings(int[] ea, int[] la)
-
resolveRuns
private static void resolveRuns(int[] wca, int defaultLevel, int[] ea, int[] la)
-
findNextNonRetainedFormattingLevel
private static int findNextNonRetainedFormattingLevel(int[] wca, int[] ea, int start, int lPrev)
-
getLevelRunLength
private static int getLevelRunLength(int[] ea, int start)
-
startsWithRetainedFormattingRun
private static boolean startsWithRetainedFormattingRun(int[] wca, int[] ea, int start)
-
getRetainedFormattingRunLength
private static int getRetainedFormattingRunLength(int[] wca, int start)
-
resolveRun
private static int resolveRun(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int levelPrev)
-
resolveWeak
private static void resolveWeak(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor)
-
resolveNeutrals
private static void resolveNeutrals(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor)
-
resolveAdjacentBoundaryNeutrals
private static void resolveAdjacentBoundaryNeutrals(int[] wca, int start, int end, int index, int bcNew)
-
resolveImplicit
private static void resolveImplicit(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor)
-
resolveSeparators
private static void resolveSeparators(int[] ica, int[] wca, int dl, int[] la)
Resolve separators and boundary neutral levels to account for UAX#9 3.4 L1 while taking into account retention of formatting codes (5.2).- Parameters:
ica
- original input class array (sequence)wca
- working copy of original intput class array (sequence), as modified by prior stepsdl
- default paragraph levella
- array of output levels to be adjusted, as produced by bidi algorithm
-
isStrong
private static boolean isStrong(int bc)
-
isNeutral
private static boolean isNeutral(int bc)
-
isRetainedFormatting
private static boolean isRetainedFormatting(int bc)
-
isRetainedFormatting
private static boolean isRetainedFormatting(int[] ca, int s, int e)
-
max
private static int max(int x, int y)
-
getClasses
private static int[] getClasses(int[] chars)
-
convertToScalar
private static boolean convertToScalar(java.lang.CharSequence cs, int[] chars) throws java.lang.IllegalArgumentException
Convert character sequence (a UTF-16 encoded string) to an array of unicode scalar values expressed as integers. If a valid UTF-16 surrogate pair is encountered, it is converted to two integers, the first being the equivalent unicode scalar value, and the second being negative one (-1). This special mechanism is used to track the use of surrogate pairs while working with unicode scalar values, and permits maintaining indices that apply both to the input UTF-16 and out scalar value sequences.- Parameters:
cs
- a UTF-16 encoded character sequencechars
- an integer array to accept the converted scalar values, where the length of the array must be the same as the length of the input character sequence- Returns:
- a boolean indicating that content is present that triggers bidirectional processing
- Throws:
java.lang.IllegalArgumentException
- if the input sequence is not a valid UTF-16 string, e.g., if it contains an isolated UTF-16 surrogate
-
convertToScalar
private static int convertToScalar(int chHi, int chLo)
Convert UTF-16 surrogate pair to unicode scalar valuee.- Parameters:
chHi
- high (most significant or first) surrogatechLo
- low (least significant or second) surrogate- Returns:
- a unicode scalar value
- Throws:
java.lang.IllegalArgumentException
- if one of the input surrogates is not valid
-
triggersBidi
private static boolean triggersBidi(int ch)
Determine of character CH triggers bidirectional processing. Bidirectional processing is deemed triggerable if CH is a strong right-to-left character, an arabic letter or number, or is a right-to-left embedding or override character.- Parameters:
ch
- a unicode scalar value- Returns:
- true if character triggers bidirectional processing
-
dump
private static void dump(java.lang.String header, int[] chars, int[] classes, int defaultLevel, int[] levels)
-
getClassName
private static java.lang.String getClassName(int bc)
-
padLeft
private static java.lang.String padLeft(int n, int width)
-
padLeft
private static java.lang.String padLeft(java.lang.String s, int width)
-
padRight
private static java.lang.String padRight(java.lang.String s, int width)
-
-