strcut_loc {tinycodet}R Documentation

Cut Strings

Description

The strcut_loc() function cuts every string in a character vector around a location range loc, such that every string is cut into the following parts:

The location range loc would usually be matrix with 2 columns, giving the start and end points of some pattern match.

The strcut_brk() function (a wrapper around stri_split_boundaries) cuts every string into individual text breaks (like character, word, line, or sentence boundaries).

Usage

strcut_loc(str, loc)

strcut_brk(str, type = "character", ...)

Arguments

str

a string or character vector.

loc

Either one of the following:

  • the result from the stri_locate_ith function.

  • a matrix of 2 integer columns, with nrow(loc)==length(str), giving the location range of the middle part.

  • a vector of length 2, giving the location range of the middle part.

type

single string; either the break iterator type, one of character, line_break, sentence, word, or a custom set of ICU break iteration rules. Defaults to "character".
[BOUNDARIES]

...

additional settings for stri_opts_brkiter

Details

The main difference between the strcut_ - functions and stri_split / strsplit, is that the latter generally removes the delimiter patterns in a string when cutting, while the strcut_-functions do not attempt to remove parts of the string by default, they only attempt to cut the strings into separate pieces. Moreover, the strcut_ - functions always return a matrix, not a list.

Value

For the strcut_loc() function:
A character matrix with length(str) rows and 3 columns, where for every row i it holds the following:

For the strcut_brk() function:
A character matrix with length(str) rows and a number of columns equal to the maximum number of pieces str was cut in.
Empty places are filled with NA.

See Also

tinycodet_strings()

Examples


x <- rep(paste0(1:10, collapse=""), 10)
print(x)
loc <- stri_locate_ith(x, 1:10, fixed = as.character(1:10))
strcut_loc(x, loc)
strcut_loc(x, c(5,5))
strcut_loc(x, c(NA, NA))
strcut_loc(x, c(5, NA))
strcut_loc(x, c(NA, 5))

test <- "The\u00a0above-mentioned    features are very useful. " %s+%
"Spam, spam, eggs, bacon, and spam. 123 456 789"
strcut_brk(test, "line")
strcut_brk(test, "word")
strcut_brk(test, "sentence")
strcut_brk(test)

[Package tinycodet version 0.3.0 Index]