cluster_docs {LBDiscover} | R Documentation |
Cluster documents using K-means
Description
This function clusters documents using K-means based on their TF-IDF vectors.
Usage
cluster_docs(
text_data,
text_column = "abstract",
n_clusters = 5,
min_term_freq = 2,
max_doc_freq = 0.9,
random_seed = 42
)
Arguments
text_data |
A data frame containing text data. |
text_column |
Name of the column containing text to analyze. |
n_clusters |
Number of clusters to create. |
min_term_freq |
Minimum frequency for a term to be included. |
max_doc_freq |
Maximum document frequency (as a proportion) for a term to be included. |
random_seed |
Seed for random number generation (for reproducibility). |
Value
A data frame with the original data and cluster assignments.
[Package LBDiscover version 0.1.0 Index]