Package com.ibm.icu.impl
Class UnicodeSetStringSpan
- java.lang.Object
-
- com.ibm.icu.impl.UnicodeSetStringSpan
-
public class UnicodeSetStringSpan extends java.lang.Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static class
UnicodeSetStringSpan.OffsetList
Helper class for UnicodeSetStringSpan.
-
Field Summary
Fields Modifier and Type Field Description private boolean
all
Set up for all variants of span()?static int
ALL
(package private) static short
ALL_CP_CONTAINED
Special spanLength short values.static int
BACK
static int
BACK_UTF16_CONTAINED
static int
BACK_UTF16_NOT_CONTAINED
static int
CONTAINED
static int
FWD
static int
FWD_UTF16_CONTAINED
static int
FWD_UTF16_NOT_CONTAINED
(package private) static short
LONG_SPAN
The spanLength is >=0xfe.private int
maxLength16
Maximum lengths of relevant strings.static int
NOT_CONTAINED
private UnicodeSetStringSpan.OffsetList
offsets
Span helperprivate boolean
someRelevant
Are there strings that are not fully contained in the code point set?private short[]
spanLengths
The lengths of span(), spanBack() etc.private UnicodeSet
spanNotSet
Set for span(not contained).private UnicodeSet
spanSet
Set for span().private java.util.ArrayList<java.lang.String>
strings
The strings of the parent set.static int
WITH_COUNT
-
Constructor Summary
Constructors Constructor Description UnicodeSetStringSpan(UnicodeSetStringSpan otherStringSpan, java.util.ArrayList<java.lang.String> newParentSetStrings)
Constructs a copy of an existing UnicodeSetStringSpan.UnicodeSetStringSpan(UnicodeSet set, java.util.ArrayList<java.lang.String> setStrings, int which)
Constructs for all variants of span(), or only for any one variant.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
addToSpanNotSet(int c)
Adds a starting or ending string character to the spanNotSet so that a character span ends before any string.boolean
contains(int c)
For fast UnicodeSet::contains(c).(package private) static short
makeSpanLengthByte(int spanLength)
private static boolean
matches16(java.lang.CharSequence s, int start, java.lang.String t, int length)
(package private) static boolean
matches16CPB(java.lang.CharSequence s, int start, int limit, java.lang.String t, int tlength)
Compare 16-bit Unicode strings (which may be malformed UTF-16) at code point boundaries.boolean
needsStringSpanUTF16()
Do the strings need to be checked in span() etc.?int
span(java.lang.CharSequence s, int start, UnicodeSet.SpanCondition spanCondition)
Spans a string.int
spanAndCount(java.lang.CharSequence s, int start, UnicodeSet.SpanCondition spanCondition, OutputInt outCount)
Spans a string and counts the smallest number of set elements on any path across the span.int
spanBack(java.lang.CharSequence s, int length, UnicodeSet.SpanCondition spanCondition)
Span a string backwards.private int
spanContainedAndCount(java.lang.CharSequence s, int start, OutputInt outCount)
private int
spanNot(java.lang.CharSequence s, int start, OutputInt outCount)
Algorithm for spanNot()==span(SpanCondition.NOT_CONTAINED) Theoretical algorithm: - Iterate through the string, and at each code point boundary: + If the code point there is in the set, then return with the current position.private int
spanNotBack(java.lang.CharSequence s, int length)
(package private) static int
spanOne(UnicodeSet set, java.lang.CharSequence s, int start, int length)
Does the set contain the next code point? If so, return its length; otherwise return its negative length.(package private) static int
spanOneBack(UnicodeSet set, java.lang.CharSequence s, int length)
private int
spanWithStrings(java.lang.CharSequence s, int start, int spanLimit, UnicodeSet.SpanCondition spanCondition)
Synchronized method for complicated spans using the offsets.
-
-
-
Field Detail
-
WITH_COUNT
public static final int WITH_COUNT
- See Also:
- Constant Field Values
-
FWD
public static final int FWD
- See Also:
- Constant Field Values
-
BACK
public static final int BACK
- See Also:
- Constant Field Values
-
CONTAINED
public static final int CONTAINED
- See Also:
- Constant Field Values
-
NOT_CONTAINED
public static final int NOT_CONTAINED
- See Also:
- Constant Field Values
-
ALL
public static final int ALL
- See Also:
- Constant Field Values
-
FWD_UTF16_CONTAINED
public static final int FWD_UTF16_CONTAINED
- See Also:
- Constant Field Values
-
FWD_UTF16_NOT_CONTAINED
public static final int FWD_UTF16_NOT_CONTAINED
- See Also:
- Constant Field Values
-
BACK_UTF16_CONTAINED
public static final int BACK_UTF16_CONTAINED
- See Also:
- Constant Field Values
-
BACK_UTF16_NOT_CONTAINED
public static final int BACK_UTF16_NOT_CONTAINED
- See Also:
- Constant Field Values
-
ALL_CP_CONTAINED
static final short ALL_CP_CONTAINED
Special spanLength short values. (since Java has not unsigned byte type) All code points in the string are contained in the parent set.- See Also:
- Constant Field Values
-
LONG_SPAN
static final short LONG_SPAN
The spanLength is >=0xfe.- See Also:
- Constant Field Values
-
spanSet
private UnicodeSet spanSet
Set for span(). Same as parent but without strings.
-
spanNotSet
private UnicodeSet spanNotSet
Set for span(not contained). Same as spanSet, plus characters that start or end strings.
-
strings
private java.util.ArrayList<java.lang.String> strings
The strings of the parent set.
-
spanLengths
private short[] spanLengths
The lengths of span(), spanBack() etc. for each string.
-
maxLength16
private final int maxLength16
Maximum lengths of relevant strings.
-
someRelevant
private boolean someRelevant
Are there strings that are not fully contained in the code point set?
-
all
private boolean all
Set up for all variants of span()?
-
offsets
private UnicodeSetStringSpan.OffsetList offsets
Span helper
-
-
Constructor Detail
-
UnicodeSetStringSpan
public UnicodeSetStringSpan(UnicodeSet set, java.util.ArrayList<java.lang.String> setStrings, int which)
Constructs for all variants of span(), or only for any one variant. Initializes as little as possible, for single use.
-
UnicodeSetStringSpan
public UnicodeSetStringSpan(UnicodeSetStringSpan otherStringSpan, java.util.ArrayList<java.lang.String> newParentSetStrings)
Constructs a copy of an existing UnicodeSetStringSpan. Assumes which==ALL for a frozen set.
-
-
Method Detail
-
needsStringSpanUTF16
public boolean needsStringSpanUTF16()
Do the strings need to be checked in span() etc.?- Returns:
- true if strings need to be checked (call span() here), false if not (use a BMPSet for best performance).
-
contains
public boolean contains(int c)
For fast UnicodeSet::contains(c).
-
addToSpanNotSet
private void addToSpanNotSet(int c)
Adds a starting or ending string character to the spanNotSet so that a character span ends before any string.
-
span
public int span(java.lang.CharSequence s, int start, UnicodeSet.SpanCondition spanCondition)
Spans a string.- Parameters:
s
- The string to be spannedstart
- The start index that the span beginsspanCondition
- The span condition- Returns:
- the limit (exclusive end) of the span
-
spanWithStrings
private int spanWithStrings(java.lang.CharSequence s, int start, int spanLimit, UnicodeSet.SpanCondition spanCondition)
Synchronized method for complicated spans using the offsets. Avoids synchronization for simple cases.- Parameters:
spanLimit
- = spanSet.span(s, start, CONTAINED)
-
spanAndCount
public int spanAndCount(java.lang.CharSequence s, int start, UnicodeSet.SpanCondition spanCondition, OutputInt outCount)
Spans a string and counts the smallest number of set elements on any path across the span.For proper counting, we cannot ignore strings that are fully contained in code point spans.
If the set does not have any fully-contained strings, then we could optimize this like span(), but such sets are likely rare, and this is at least still linear.
- Parameters:
s
- The string to be spannedstart
- The start index that the span beginsspanCondition
- The span conditionoutCount
- The count- Returns:
- the limit (exclusive end) of the span
-
spanContainedAndCount
private int spanContainedAndCount(java.lang.CharSequence s, int start, OutputInt outCount)
-
spanBack
public int spanBack(java.lang.CharSequence s, int length, UnicodeSet.SpanCondition spanCondition)
Span a string backwards.- Parameters:
s
- The string to be spannedspanCondition
- The span condition- Returns:
- The string index which starts the span (i.e. inclusive).
-
spanNot
private int spanNot(java.lang.CharSequence s, int start, OutputInt outCount)
Algorithm for spanNot()==span(SpanCondition.NOT_CONTAINED) Theoretical algorithm: - Iterate through the string, and at each code point boundary: + If the code point there is in the set, then return with the current position. + If a set string matches at the current position, then return with the current position. Optimized implementation: (Same assumption as for span() above.) Create and cache a spanNotSet which contains all of the single code points of the original set but none of its strings. For each set string add its initial code point to the spanNotSet. (Also add its final code point for spanNotBack().) - Loop: + Do spanLength=spanNotSet.span(SpanCondition.NOT_CONTAINED). + If the current code point is in the original set, then return the current position. + If any set string matches at the current position, then return the current position. + If there is no match at the current position, neither for the code point there nor for any set string, then skip this code point and continue the loop. This happens for set-string-initial code points that were added to spanNotSet when there is not actually a match for such a set string.- Parameters:
s
- The string to be spannedstart
- The start index that the span beginsoutCount
- If not null: Receives the number of code points across the span.- Returns:
- the limit (exclusive end) of the span
-
spanNotBack
private int spanNotBack(java.lang.CharSequence s, int length)
-
makeSpanLengthByte
static short makeSpanLengthByte(int spanLength)
-
matches16
private static boolean matches16(java.lang.CharSequence s, int start, java.lang.String t, int length)
-
matches16CPB
static boolean matches16CPB(java.lang.CharSequence s, int start, int limit, java.lang.String t, int tlength)
Compare 16-bit Unicode strings (which may be malformed UTF-16) at code point boundaries. That is, each edge of a match must not be in the middle of a surrogate pair.- Parameters:
s
- The string to match in.start
- The start index of s.limit
- The limit of the subsequence of s being spanned.t
- The substring to be matched in s.tlength
- The length of t.
-
spanOne
static int spanOne(UnicodeSet set, java.lang.CharSequence s, int start, int length)
Does the set contain the next code point? If so, return its length; otherwise return its negative length.
-
spanOneBack
static int spanOneBack(UnicodeSet set, java.lang.CharSequence s, int length)
-
-