nuss {NUSS}R Documentation

Mixed N-Grams and Unigram Sequence Segmentation (NUSS) function

Description

nuss returns the data.frame containing hashtag, its segmented version, ids of dictionary words, number of words it have taken to segment the hashtag, total number of points, and computed score.

Usage

nuss(sequences, texts)

Arguments

sequences

character vector, sequence to be segmented, (e.g., hashtag) or without it. Case-insensitive.

texts

character vector, these are the texts used to create n-grams and unigram dictionary. Case-insensitive.

Details

This function is an arbitrary combination of ngrams_dictionary, unigram_dictionary, ngrams_segmentation, unigram_sequence_segmentation, created to easily segment short texts based on text corpus.

Value

The output always will be data.frame with sequences, that were
The output is not in the input order. If needed, use lapply

Examples

texts <- c("this is science",
           "science is #fascinatingthing",
           "this is a scientific approach",
           "science is everywhere",
           "the beauty of science")
nuss(c("thisisscience", "scienceisscience"), texts)


[Package NUSS version 0.1.0 Index]