strcut_loc {tinycodet} | R Documentation |
The strcut_loc()
function
cuts every string in a character vector around a location range loc
,
such that every string is cut into the following parts:
the sub-string before loc
;
the sub-string at loc
itself;
the sub-string after loc
.
The location range loc
would usually be matrix with 2 columns,
giving the start and end points of some pattern match.
The strcut_brk()
function
(a wrapper around stri_split_boundaries)
cuts every string into individual text breaks
(like character, word, line, or sentence boundaries).
strcut_loc(str, loc)
strcut_brk(str, type = "character", ...)
str |
a string or character vector. |
loc |
Either one of the following:
|
type |
single string;
either the break iterator type,
one of |
... |
additional settings for stri_opts_brkiter |
The main difference between the strcut_
- functions
and stri_split / strsplit,
is that the latter generally removes the delimiter patterns in a string when cutting,
while the strcut_
-functions do not attempt to remove parts of the string by default,
they only attempt to cut the strings into separate pieces.
Moreover, the strcut_
- functions always return a matrix, not a list.
For the strcut_loc()
function:
A character matrix with length(str)
rows and 3 columns,
where for every row i
it holds the following:
the first column contains the sub-string before loc[i,]
,
or NA
if loc[i,]
contains NA
;
the second column contains the sub_string at loc[i,]
,
or the uncut string if loc[i,]
contains NA
;
the third and last column contains the sub-string after loc[i,]
,
or NA
if loc[i,]
contains NA
.
For the strcut_brk()
function:
A character matrix with length(str)
rows and
a number of columns equal to the maximum number of pieces str
was cut in.
Empty places are filled with NA
.
x <- rep(paste0(1:10, collapse=""), 10)
print(x)
loc <- stri_locate_ith(x, 1:10, fixed = as.character(1:10))
strcut_loc(x, loc)
strcut_loc(x, c(5,5))
strcut_loc(x, c(NA, NA))
strcut_loc(x, c(5, NA))
strcut_loc(x, c(NA, 5))
test <- "The\u00a0above-mentioned features are very useful. " %s+%
"Spam, spam, eggs, bacon, and spam. 123 456 789"
strcut_brk(test, "line")
strcut_brk(test, "word")
strcut_brk(test, "sentence")
strcut_brk(test)