Tfidf_dist {BIDistances} | R Documentation |
Term frequency-inverse document frequency distance
Description
Computes the term frequency inverse document frequency (tfidf) distance for a FeatureMatrix_Gene2GoTerm. In case of genes with annotated GOterms from gene ontology genes can be interpreted as documents and GOterms as terms.
Usage
Tfidf_dist(FeatureMatrix_Gene2GoTerm, tf_fun = mean)
Arguments
FeatureMatrix_Gene2GoTerm |
[1:n,1:d] Matrix, with n genes and d GO-Terms. |
tf_fun |
Function, defining the numerator value in the normalized Term-frequency. The default is the mean of the not 0 values. |
Details
For the FeatureMatrix_Gene2GoTerm it is:
FeatureMatrix_Gene2GoTerm[i,j] > 0 iff GOterm j is relevant for gene i. The FeatureMatrix_Gene2GoTerm[i,j] > 1 if the specific gene is annotated by in a specific GO-Term with more than one evidence code FeatureMatrix_Gene2GoTerm[i,j] is the frequency of term js occurance in document i.
Value
List with
dist |
Numeric vector containing the tdfidf distances between the documents = absolute difference of TfidfWeights |
TfidfWeights |
[1:n] Numeric vector containing the term frequence inverse document frequency weights used for the distance, given as the Term frequency*Inverse document frequency |
Author(s)
Michael Thrun
References
Stier, Q. and Thrun, M., C.: Deriving homogeneous subsets from gene sets by exploiting the Gene Ontology, Informatica, in review, 2023
Examples
data(Hearingloss_N109)
V = Tfidf_dist(Hearingloss_N109$FeatureMatrix_Gene2Term)
dist = V$dist
TfidfWeights = V$TfidfWeights