balqual {vecmatch} | R Documentation |
Evaluate Matching Quality
Description
The balqual()
function evaluates the balance quality of a
dataset after matching, comparing it to the original unbalanced dataset. It
computes various summary statistics and provides an easy interpretation
using user-specified cutoff values.
Usage
balqual(
matched_data = NULL,
formula = NULL,
type = c("smd", "r", "var_ratio"),
statistic = c("mean", "max"),
cutoffs = NULL,
round = 3,
print_out = TRUE
)
Arguments
matched_data |
An object of class |
formula |
A valid R formula used to compute generalized propensity
scores during the first step of the vector matching algorithm in
|
type |
A character vector specifying the quality metrics to calculate.
Can maximally contain 3 values in a vector created by the
|
statistic |
A character vector specifying the type of statistics used to summarize the quality metrics. Since quality metrics are calculated for all pairwise comparisons between treatment levels, they need to be aggregated for the entire dataset.
To compute both, provide both names using the |
cutoffs |
A numeric vector with the same length as the number of
coefficients specified in the |
round |
An integer specifying the number of decimal places to round the output to. |
print_out |
Logical. If |
Value
If assigned to a name, returns a list of summary statistics of class
quality
containing:
-
quality_mean
- A data frame with the mean values of the statistics specified in thetype
argument for all balancing variables used informula
. -
quality_max
- A data frame with the maximal values of the statistics specified in thetype
argument for all balancing variables used informula
. -
perc_matched
- A single numeric value indicating the percentage of observations in the original dataset that were matched. -
statistic
- A single string defining which statistic will be displayed in the console. -
summary_head
- A summary of the matching process. Ifmax
is included in thestatistic
, it contains the maximal observed values for each variable; otherwise, it includes the mean values. -
n_before
- The number of observations in the dataset before matching. -
n_after
- The number of observations in the dataset after matching. -
count_table
- A contingency table showing the distribution of the treatment variable before and after matching.
The balqual()
function also prints a well-formatted table with the
defined summary statistics for each variable in the formula
to the
console.
References
Rubin, D.B. Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation. Health Services & Outcomes Research Methodology 2, 169–188 (2001). https://doi.org/10.1023/A:1020363010465
Michael J. Lopez, Roee Gutman "Estimation of Causal Effects with Multiple Treatments: A Review and New Ideas," Statistical Science, Statist. Sci. 32(3), 432-454, (August 2017)
See Also
match_gps()
for matching the generalized propensity scores;
estimate_gps()
for the documentation of the formula
argument.
Examples
# We try to balance the treatment variable in the cancer dataset based on age
# and sex covariates
data(cancer)
# Firstly, we define the formula
formula_cancer <- formula(status ~ age * sex)
# Then we can estimate the generalized propensity scores
gps_cancer <- estimate_gps(formula_cancer,
cancer,
method = "multinom",
reference = "control",
verbose_output = TRUE
)
# ... and drop observations based on the common support region...
csr_cancer <- csregion(gps_cancer)
# ... to match the samples using `match_gps()`
matched_cancer <- match_gps(csr_cancer,
reference = "control",
caliper = 1,
kmeans_cluster = 5,
kmeans_args = list(n.iter = 100),
verbose_output = TRUE
)
# At the end we can assess the quality of matching using `balqual()`
balqual(
matched_data = matched_cancer,
formula = formula_cancer,
type = "smd",
statistic = "max",
round = 3,
cutoffs = 0.2
)