stri_locate_ith {tinycodet} | R Documentation |
i^{th}
Pattern Occurrence or Text BoundaryThe stri_locate_ith()
function
locates the i^{th}
occurrence of a pattern in each string of
some character vector.
The stri_locate_ith_boundaries()
function
locates the i^{th}
text boundary
(like character, word, line, or sentence boundaries).
stri_locate_ith(str, i, ..., regex, fixed, coll, charclass)
stri_locate_ith_boundaries(str, i, ..., type = "character")
str |
a string or character vector. |
i |
a number, or a numeric vector of the same length as
If |
... |
more arguments to be supplied to
stri_locate or stri_locate_all_boundaries. |
regex , fixed , coll , charclass |
a character vector of search patterns,
as in stri_locate. |
type |
single string;
either the break iterator type,
one of |
Special note regarding charclass
The stri_locate_ith()
function is based on
stri_locate_all.
This generally gives results consistent with
stri_locate_first or stri_locate_last,
but the exception is when charclass
pattern is used.
Where the functions
stri_locate_first or stri_locate_last
give the location of the first or last single character matching the charclass
(respectively),
stri_locate_all gives the start and end of consecutive characters.
The stri_locate_ith()
is in this aspect more in line with
stri_locate_all,
as it gives the i^{th}
set of consecutive characters.
The stri_locate_ith()
function returns an integer matrix with two columns,
giving the start and end positions of the i^{th}
matches,
two NA
s if no matches are found,
and also two NA
s if str
is NA
.
#############################################################################
# practical example with regex & fixed ====
# input character vector:
x <- c(paste0(letters[1:13], collapse=""), paste0(letters[14:26], collapse=""))
print(x)
# report ith (second and second-last) vowel locations:
p <- rep("A|E|I|O|U", 2) # vowels
loc <- stri_locate_ith(x, c(2, -2), regex=p, case_insensitive=TRUE)
print(loc)
# extract ith vowels:
extr <- stringi::stri_sub(x, from=loc)
print(extr)
# replace ith vowels with numbers:
repl <- stringi::stri_replace_all(
extr, fixed = c("a", "e", "i", "o", "u"), replacement = 1:5, vectorize_all = FALSE
)
x <- stringi::stri_sub_replace(x, loc, replacement=repl)
print(x)
#############################################################################
# practical example with boundaries ====
# input character vector:
x <- c("good morning and good night",
"hello ladies and gentlemen")
print(x)
# report ith word locations:
loc <- stri_locate_ith_boundaries(x, c(-3, 3), type = "word")
print(loc)
# extract ith words:
extr <- stringi::stri_sub(x, from=loc)
print(extr)
# transform and replace words:
tf <- chartr(extr, old = "a-zA-Z", new = "A-Za-z")
x <- stringi::stri_sub_replace(x, loc, replacement=tf)
print(x)
#############################################################################
# find pattern ====
extr <- stringi::stri_sub(x, from=loc)
repl <- chartr(extr, old = "a-zA-Z", new = "A-Za-z")
stringi::stri_sub_replace(x, loc, replacement=repl)
# simple pattern ====
x <- rep(paste0(1:10, collapse=""), 10)
print(x)
out <- stri_locate_ith(x, 1:10, regex = as.character(1:10))
cbind(1:10, out)
x <- c(paste0(letters[1:13], collapse=""), paste0(letters[14:26], collapse=""))
print(x)
p <- rep("a|e|i|o|u",2)
out <- stri_locate_ith(x, c(-1, 1), regex=p)
print(out)
substr(x, out[,1], out[,2])
#############################################################################
# ignore case pattern ====
x <- c(paste0(letters[1:13], collapse=""), paste0(letters[14:26], collapse=""))
print(x)
p <- rep("A|E|I|O|U", 2)
out <- stri_locate_ith(x, c(1, -1), regex=p, case_insensitive=TRUE)
substr(x, out[,1], out[,2])
#############################################################################
# multi-character pattern ====
x <- c(paste0(letters[1:13], collapse=""), paste0(letters[14:26], collapse=""))
print(x)
# multi-character pattern:
p <- rep("AB", 2)
out <- stri_locate_ith(x, c(1, -1), regex=p, case_insensitive=TRUE)
print(out)
substr(x, out[,1], out[,2])
#############################################################################
# Replacement transformation using stringi ====
x <- c("hello world", "goodbye world")
loc <- stri_locate_ith(x, c(1, -1), regex="a|e|i|o|u")
extr <- stringi::stri_sub(x, from=loc)
repl <- chartr(extr, old = "a-zA-Z", new = "A-Za-z")
stringi::stri_sub_replace(x, loc, replacement=repl)
#############################################################################
# Boundaries ====
test <- c(
paste0("The\u00a0above-mentioned features are very useful. ",
"Spam, spam, eggs, bacon, and spam. 123 456 789"),
"good morning, good evening, and good night"
)
loc <- stri_locate_ith_boundaries(test, i = c(1, -1), type = "word")
stringi::stri_sub(test, from=loc)