unigram_dictionary {NUSS} | R Documentation |
Create unigram dictionary
Description
unigram_dictionary
returns the data.frame containing dictionary for
unigram_sequence_segmentation.
Usage
unigram_dictionary(texts, points_filter = 1)
Arguments
texts |
character vector, these are the texts used to create ngrams dictionary. Case-sensitive. |
points_filter |
numeric, sets the minimal number of points (occurrences) of an unigram to be included in the dictionary. |
Value
The output always will be data.frame with 4 columns: 1) to_search, 2) to_replace, 3) id, 4) points.
Examples
texts <- c("this is science",
"science is #fascinatingthing",
"this is a scientific approach",
"science is everywhere",
"the beauty of science")
unigram_dictionary(texts)
[Package NUSS version 0.1.0 Index]