topicsTest {topics}R Documentation

Test topics or n-grams

Description

Statistically test topics or n-grams in relation to one or two other variables using regression or t-test.

Usage

topicsTest(
  data,
  model = NULL,
  preds = NULL,
  ngrams = NULL,
  x_variable = NULL,
  y_variable = NULL,
  controls = c(),
  test_method = "default",
  p_adjust_method = "fdr",
  seed = 42
)

Arguments

data

(tibble) The tibble containing the variables to be tested.

model

(list) A trained model LDA-model from the topicsModel() function.

preds

(tibble) The predictions from the topicsPred() function.

ngrams

(list) Output of the n-gram function.

x_variable

(string) The x variable name to be predicted, and to be plotted (only needed for regression or correlation).

y_variable

(string) The y variable name to be predicted, and to be plotted (only needed for regression or correlation).

controls

(vector) The control variables (not supported yet).

test_method

(string) The test method to use. "default" checks if x_variable and y_variable only contain 0s and 1s, for which it applies logistic regression; otherwise it applies linear regression. Alternatively, the user may manually specify either "linear_regression" or "logistic_regression".

p_adjust_method

(character) Method to adjust/correct p-values for multiple comparisons (default = "fdr"; see also "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "none").

seed

(integer) The seed to set for reproducibility

Value

A list of the test results, test method, and prediction variable.

Examples


# Test the topic document distribution in respect to a variable

dtm <- topicsDtm(
  data = dep_wor_data$Depphrase)

model <- topicsModel(
  dtm = dtm, # output of topicsDtm()
  num_topics = 20,
  num_top_words = 10,
  num_iterations = 1000,
  seed = 42)
                     
preds <- topicsPreds(
 model = model, # output of topicsModel()
 data = dep_wor_data$Depphrase)
                     
test <- topicsTest(
  model = model, # output of topicsModel()
  data=dep_wor_data,
  preds = preds, # output of topicsPreds()
  test_method = "linear_regression",
  x_variable = "Age")
                 

[Package topics version 0.50 Index]