Tfidf_dist {BIDistances}R Documentation

Term frequency-inverse document frequency distance

Description

Computes the term frequency inverse document frequency (tfidf) distance for a FeatureMatrix_Gene2GoTerm. In case of genes with annotated GOterms from gene ontology genes can be interpreted as documents and GOterms as terms.

Usage

Tfidf_dist(FeatureMatrix_Gene2GoTerm, tf_fun = mean)

Arguments

FeatureMatrix_Gene2GoTerm

[1:n,1:d] Matrix, with n genes and d GO-Terms.

tf_fun

Function, defining the numerator value in the normalized Term-frequency. The default is the mean of the not 0 values.

Details

For the FeatureMatrix_Gene2GoTerm it is:
FeatureMatrix_Gene2GoTerm[i,j] > 0 iff GOterm j is relevant for gene i. The FeatureMatrix_Gene2GoTerm[i,j] > 1 if the specific gene is annotated by in a specific GO-Term with more than one evidence code FeatureMatrix_Gene2GoTerm[i,j] is the frequency of term js occurance in document i.

Value

List with

dist

Numeric vector containing the tdfidf distances between the documents = absolute difference of TfidfWeights

TfidfWeights

[1:n] Numeric vector containing the term frequence inverse document frequency weights used for the distance, given as the Term frequency*Inverse document frequency

Author(s)

Michael Thrun

References

Stier, Q. and Thrun, M., C.: Deriving homogeneous subsets from gene sets by exploiting the Gene Ontology, Informatica, in review, 2023

Examples

data(Hearingloss_N109)
V = Tfidf_dist(Hearingloss_N109$FeatureMatrix_Gene2Term)
dist = V$dist
TfidfWeights = V$TfidfWeights

[Package BIDistances version 0.1.3 Index]