calc_doc_sim {LBDiscover}R Documentation

Calculate document similarity using TF-IDF and cosine similarity

Description

This function calculates the similarity between documents using TF-IDF weighting and cosine similarity.

Usage

calc_doc_sim(
  text_data,
  text_column = "abstract",
  min_term_freq = 2,
  max_doc_freq = 0.9
)

Arguments

text_data

A data frame containing text data.

text_column

Name of the column containing text to analyze.

min_term_freq

Minimum frequency for a term to be included.

max_doc_freq

Maximum document frequency (as a proportion) for a term to be included.

Value

A similarity matrix for the documents.


[Package LBDiscover version 0.1.0 Index]