process_embed {keyclust}R Documentation

A tool designed to reduce redundant terms in a fitted embedding model

Description

Takes a fitted embedding model as an input. Allows users to combine embeddings by the case, stem, or lemma of associated terms.

Usage

process_embed(
  x,
  words = NULL,
  punct = TRUE,
  tolower = TRUE,
  lemmatize = TRUE,
  stem = FALSE
)

Arguments

x

A fitted word embedding model in the data frame format

words

The name of a column that corresponds to the word dimension of the fitted word embeddings

punct

Removes punctuation

tolower

Combines terms that differ by case

lemmatize

Combines terms that share a common lemma. Uses the lexicon package by default.

stem

Combines terms that share a common stem. Note: Stemming should not be used in conjunction with lemmatize.

Value

A data frame with the same columns as the input, but with redundant terms combined.


[Package keyclust version 1.2.5 Index]