lm_betaselect {betaselectr} | R Documentation |
Betas-Select in a Regression Model
Description
Can fit a linear regression models with selected variables standardized; handle product terms correctly and skip categorical predictors in standardization.
Usage
lm_betaselect(
...,
to_standardize = NULL,
not_to_standardize = NULL,
skip_response = FALSE,
do_boot = TRUE,
bootstrap = 100L,
iseed = NULL,
parallel = FALSE,
ncpus = parallel::detectCores(logical = FALSE) - 1,
progress = TRUE,
load_balancing = FALSE,
model_call = c("lm", "glm")
)
glm_betaselect(
...,
to_standardize = NULL,
not_to_standardize = NULL,
skip_response = FALSE,
do_boot = TRUE,
bootstrap = 100L,
iseed = NULL,
parallel = FALSE,
ncpus = parallel::detectCores(logical = FALSE) - 1,
progress = TRUE,
load_balancing = FALSE
)
## S3 method for class 'lm_betaselect'
print(
x,
digits = max(3L, getOption("digits") - 3L),
type = c("beta", "standardized", "raw", "unstandardized"),
...
)
## S3 method for class 'glm_betaselect'
print(
x,
digits = max(3L, getOption("digits") - 3L),
type = c("beta", "standardized", "raw", "unstandardized"),
...
)
raw_output(x)
Arguments
... |
For |
to_standardize |
A string vector,
which should be the names of the
variables to be standardized.
Default is |
not_to_standardize |
A string
vector, which should be the names
of the variables that should not be
standardized. This argument is useful
when most variables, except for a few,
are to be standardized. This argument
cannot be ued with |
skip_response |
Logical. If
|
do_boot |
Whether bootstrapping
will be conducted. Default is |
bootstrap |
If |
iseed |
If |
parallel |
If |
ncpus |
If |
progress |
Logical. If |
load_balancing |
Logical. If
|
model_call |
The model function
to be called.
If |
x |
An |
digits |
The number of significant digits to be printed for the coefficients. |
type |
The coefficients to be
printed. For |
Details
The functions lm_betaselect()
and glm_betaselect()
let users
select which variables to be
standardized when computing the
standardized solution. They have the
following features:
They automatically skip categorical predictors (i.e., factor or string variables).
They do not standardize a product term, which is incorrect. Instead, they compute the product term with its component variables standardized, if requested.
They standardize the selected variables before fitting a model. Therefore, If a model has the term
log(x)
andx
is one of the selected variables, the model used the logarithm of the standardizedx
in the model, instead of standardizedlog(x)
which is difficult to interpret.They can be used to generate nonparametric bootstrap confidence intervals for the standardized solution. Bootstrap confidence interval is better than the default confidence interval ignoring the standardization because it takes into account the sampling variance of the standard deviations. Preliminary support for bootstrap confidence has been found for forming confidence intervals for coefficients involving standardized variables in linear regression (Jones & Waller, 2013).
Problems With Common Approaches
In some regression programs, users have limited control on which variables to standardize when requesting the so-called "betas". The solution may be uninterpretable or misleading in these conditions:
Dummy variables are standardized and their coefficients cannot be interpreted as the difference between two groups on the outcome variables.
Product terms (interaction terms) are standardized and they cannot be interpreted as the changes in the effects of focal variables when the moderators change (Cheung, Cheung, Lau, Hui, & Vong, 2022).
Variables with meaningful units can be more difficult to interpret when they are standardized (e.g., age).
How The Function Work
They standardize the original variables before they are used in the model. Therefore, strictly speaking, they do not standardize the predictors in model, but standardize the input variable (Gelman et al., 2021).
The requested model is then fitted to
the dataset with selected variables
standardized. For the ease of
follow-up analysis, both the results
with selected variables standardized
and the results without
standardization are stored. If
required, the results without
standardization can be retrieved
by raw_output()
.
Methods
The output of lm_betaselect()
is
an lm_betaselect
-class object,
and the output of glm_betaselect()
is a glm_betaselect
-class object.
They have the following methods:
A
coef
-method for extracting the coefficients of the model. (Seecoef.lm_betaselect()
andcoef.glm_betaselect()
for details.)A
vcov
-method for extracting the variance-covariance matrix of the estimates of the coefficients. If bootstrapping is requested, it can return the matrix based on the bootstrapping estimates. (Seevcov.lm_betaselect()
andvcov.glm_betaselect()
for details.)A
confint
-method for forming the confidence intervals of the estimates of the coefficients. If bootstrapping is requested, it can return the bootstrap confidence intervals. (Seeconfint.lm_betaselect()
andconfint.glm_betaselect()
for details.)A
summary
-method for printing the summary of the results, with additional information such as the number of bootstrap samples and which variables have been standardized. (Seesummary.lm_betaselect()
andsummary.glm_betaselect()
for details.)An
anova
-method for printing the ANOVA table. Can also be used to compare two or more outputs oflm_betaselect()
orglm_betaselect()
(Seeanova.glm_betaselect()
andanova.glm_betaselect()
for details.)A
predict
-method for computing predicted values. It can be used to compute the predicted values given a set of new unstandardized data. The data will be standardized before computing the predicted values in the models with standardization. (Seepredict.lm_betaselect()
andpredict.glm_betaselect()
for details.)The default
update
-method for updating a call also works for anlm_betaselect
object or aglm_betaselect()
object. It can update the model in the same way it updates a model fitted bystats::lm()
orstats::glm()
, and also update the arguments oflm_betaselect()
orglm_betaselect()
such as the variables to be standardized. (Seestats::update()
for details.)
Most other methods for the output
of stats::lm()
and stats::glm()
should also work
on an lm_betaselect
-class object
or a glm_betaselect
-class object,
respectively.
Some of them will give the same
results regardless of the variables
standardized. Examples are
rstandard()
and cooks.distance()
.
For some others, they should be used
with cautions if they make use of
the variance-covariance matrix
of the estimates.
To use the methods for lm
objects
or glm
objects
on the results without standardization,
simply use raw_output()
. For example,
to get the fitted values without
standardization, call
fitted(raw_output(x))
, where x
is the output of lm_betaselect()
or glm_betaselect()
.
The function raw_output()
simply extracts
the regression output by stats::lm()
or stats::glm()
on the variables without standardization.
Value
The function lm_betaselect()
returns an object of the class lm_betaselect
,
The function glm_betaselect()
returns an object of the class
glm_betaselect
. They are similar
in structure to the output of
stats::lm()
and stats::glm()
,
with additional information stored.
The function raw_output()
returns
an object of the class lm
or
glm
, which are
the results of fitting the model
to the data by stats::lm()
or stats::glm()
without
standardization.
Author(s)
Shu Fai Cheung https://orcid.org/0000-0002-9871-9448
References
Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) Improving an old way to measure moderation effect in standardized units. Health Psychology, 41(7), 502-505. doi:10.1037/hea0001188
Craig, C. C. (1936). On the frequency function of xy. The Annals of Mathematical Statistics, 7(1), 1–15. doi:10.1214/aoms/1177732541
Gelman, A., Hill, J., & Vehtari, A. (2021). Regression and other stories. Cambridge University Press. doi:10.1017/9781139161879
Jones, J. A., & Waller, N. G. (2013). Computing confidence intervals for standardized regression coefficients. Psychological Methods, 18(4), 435–453. doi:10.1037/a0033269
See Also
print.lm_betaselect()
and
print.glm_betaselect()
for the
print
-methods.
Examples
data(data_test_mod_cat)
# Standardize only iv
lm_beta_x <- lm_betaselect(dv ~ iv*mod + cov1 + cat1,
data = data_test_mod_cat,
to_standardize = "iv")
lm_beta_x
summary(lm_beta_x)
# Manually standardize iv and call lm()
data_test_mod_cat$iv_z <- scale(data_test_mod_cat[, "iv"])[, 1]
lm_beta_x_manual <- lm(dv ~ iv_z*mod + cov1 + cat1,
data = data_test_mod_cat)
coef(lm_beta_x)
coef(lm_beta_x_manual)
# Standardize all numeric variables
lm_beta_all <- lm_betaselect(dv ~ iv*mod + cov1 + cat1,
data = data_test_mod_cat)
# Note that cat1 is not standardized
summary(lm_beta_all)
data(data_test_mod_cat)
data_test_mod_cat$p <- scale(data_test_mod_cat$dv)[, 1]
data_test_mod_cat$p <- ifelse(data_test_mod_cat$p > 0,
yes = 1,
no = 0)
# Standardize only iv
logistic_beta_x <- glm_betaselect(p ~ iv*mod + cov1 + cat1,
family = binomial,
data = data_test_mod_cat,
to_standardize = "iv")
summary(logistic_beta_x)
logistic_beta_x
summary(logistic_beta_x)
# Manually standardize iv and call glm()
data_test_mod_cat$iv_z <- scale(data_test_mod_cat[, "iv"])[, 1]
logistic_beta_x_manual <- glm(p ~ iv_z*mod + cov1 + cat1,
family = binomial,
data = data_test_mod_cat)
coef(logistic_beta_x)
coef(logistic_beta_x_manual)
# Standardize all numeric predictors
logistic_beta_allx <- glm_betaselect(p ~ iv*mod + cov1 + cat1,
family = binomial,
data = data_test_mod_cat,
to_standardize = c("iv", "mod", "cov1"))
# Note that cat1 is not standardized
summary(logistic_beta_allx)
summary(raw_output(lm_beta_x))