simDoc {smdc} | R Documentation |
This function calculates the similarity between documents and documents.
simDoc(docMatrix1, docMatrix2, norm = FALSE, method = "cosine")
docMatrix1 |
Document matrix whose rows represent feature vector of one document. This matrix must satisfy the following: colnames(docMatrix1) denote feature names, rownames(docMatrix1) denote document names, every element is numerical. |
docMatrix2 |
Document matrix whose rows represent feature vector of one document. This matrix must satisfy the following: colnames(docMatrix2) denote feature names, rownames(docMatrix2) denote document names, every element is numerical. |
norm |
Whether normalize similarity matrix or not. |
method |
Method to caluculate similarity. |
Similarity Matrix whose rows represent documents of docMatrix1 and whose columns represent documents of docMatrix2. This matrix is n * m matrix where n=ncol(docMatrix1) and m=ncol(docMatrix2), and satisfy the following: rownames(returnValue)=colnames(docMatrix1), colnames(returnValue)=colnames(docMatrix2).
Masaaki TAKADA
## The function is currently defined as function (docMatrix1, docMatrix2, norm = FALSE, method = "cosine") { library("proxy") exDocMatrix <- uniform(docMatrix1, docMatrix2) exDocMatrix1 <- exDocMatrix[[1]] exDocMatrix2 <- exDocMatrix[[2]] colnames(exDocMatrix1) <- paste("r_", colnames(docMatrix1), sep = "") colnames(exDocMatrix2) <- paste("c_", colnames(docMatrix2), sep = "") sim <- as.matrix(simil(t(cbind(exDocMatrix1, exDocMatrix2)), method = method))[colnames(exDocMatrix1), colnames(exDocMatrix2)] rownames(sim) <- colnames(docMatrix1) colnames(sim) <- colnames(docMatrix2) if (norm) { sim <- normalize(sim) } return(sim) }