calclift {stm} | R Documentation |
Calculate Lift Words
Description
A primarily internal function for calculating words according to the lift metric.
We expect most users will use labelTopics
instead.
Usage
calclift(logbeta, wordcounts)
Arguments
logbeta |
a K by V matrix containing the log probabilities of seeing word v conditional on topic k |
wordcounts |
a V length vector indicating the number of times each word appears in the corpus. |
Details
Lift is the calculated by dividing the topic-word distribution by the empirical word count probability distribution. In other words the Lift for word v in topic k can be calculated as:
Lift = \beta_{k,v}/(w_v/\sum_v w_v)
We include this after seeing it used effectively in Matt Taddy's work including his excellent maptpx package. Definitions are given in Taddy(2012).
References
Taddy, Matthew. 2012. "On Estimation and Selection for Topic Models." AISTATS JMLR W&CP 22