topic_model_data {MLBC} | R Documentation |
Topic model dataset
Description
Dataset containing topic model outputs for demonstrating bias correction methods in topic model regressions using CEO diary data.
Usage
topic_model_data
Format
A list with 8 components:
- covars
Data frame (916 x 11): Control variables
- estimation_data
Data frame (916 x 672): Contains outcome
ly
and word frequencies- gamma_draws
Data frame (2000 x 2): MCMC draws
- theta_est_full
Data frame (916 x 2): Full sample topic proportions
- theta_est_samp
Data frame (916 x 2): Subsample topic proportions
- beta_est_full
Data frame (2 x 654): Full sample topic-word distributions
- beta_est_samp
Data frame (2 x 654): Subsample topic-word distributions
- lda_data
Data frame (916 x 2): LDA validation data
Source
CEO diary data from Bandiera et al (2020), Journal of Political Economy
See Also
Examples
data(topic_model_data)
# Basic exploration
Y <- topic_model_data$estimation_data$ly
theta <- as.matrix(topic_model_data$theta_est_full)
cat("Sample size:", length(Y), "\n")
cat("Mean log employment:", round(mean(Y), 2), "\n")
cat("Topic 1 mean:", round(mean(theta[, 1]), 3), "\n")
[Package MLBC version 0.2.2 Index]