catsib {irtQ} | R Documentation |
CATSIB DIF Detection Procedure
Description
This function performs DIF analysis on items using the CATSIB procedure (Nandakumar & Roussos, 2004), a modified version of SIBTEST (Shealy & Stout, 1993). The CATSIB procedure is suitable for computerized adaptive testing (CAT) environments. In CATSIB, examinees are matched on IRT-based ability estimates that have been adjusted using a regression correction method (Shealy & Stout, 1993) to reduce statistical bias in the CATSIB statistic caused by impact.
Usage
catsib(
x = NULL,
data,
score = NULL,
se = NULL,
group,
focal.name,
item.skip = NULL,
D = 1,
n.bin = c(80, 10),
min.binsize = 3,
max.del = 0.075,
weight.group = c("comb", "foc", "ref"),
alpha = 0.05,
missing = NA,
purify = FALSE,
max.iter = 10,
min.resp = NULL,
method = "ML",
range = c(-5, 5),
norm.prior = c(0, 1),
nquad = 41,
weights = NULL,
ncore = 1,
verbose = TRUE,
...
)
Arguments
x |
A data frame containing item metadata (e.g., item parameters, number
of categories, IRT model types, etc.). See |
data |
A matrix of examinees' item responses corresponding to the items
specified in the |
score |
A numeric vector containing examinees' ability estimates (theta
values). If not provided, |
se |
A vector of standard errors corresponding to the ability estimates.
The order of the standard errors must match the order of the ability
estimates provided in the |
group |
A numeric or character vector indicating examinees' group membership. The length of the vector must match the number of rows in the response data matrix. |
focal.name |
A single numeric or character value specifying the focal
group. For instance, given |
item.skip |
A numeric vector of item indices to exclude from DIF analysis.
If |
D |
A scaling constant used in IRT models to make the logistic function closely approximate the normal ogive function. A value of 1.7 is commonly used for this purpose. Default is 1. |
n.bin |
A numeric vector of two positive integers specifying the maximum
and minimum numbers of bins (or intervals) on the ability scale. The first
and second values represent the maximum and minimum numbers of bins,
respectively. Default is |
min.binsize |
A positive integer specifying the minimum number of
examinees required in each bin. To ensure stable statistical estimation,
each bin must contain at least the specified number of examinees from both
the reference and focal groups in order to be included in the calculation
of |
max.del |
A numeric value specifying the maximum allowable proportion of examinees that may be excluded from either the reference or focal group during the binning process. This threshold is used when determining the number of bins on the ability scale automatically. Default is 0.075. See the Details section for more information. |
weight.group |
A character string specifying the target ability
distribution used to compute the expected DIF measure |
alpha |
A numeric value specifying the significance level ( |
missing |
A value indicating missing responses in the data set. Default
is |
purify |
Logical. Indicates whether to apply a purification procedure.
Default is |
max.iter |
A positive integer specifying the maximum number of
iterations allowed for the purification process. Default is |
min.resp |
A positive integer specifying the minimum number of valid
item responses required from an examinee in order to compute an ability
estimate. Default is |
method |
A character string indicating the scoring method to use. Available options are:
Default is |
range |
A numeric vector of length two specifying the lower and upper
bounds of the ability scale. This is used for the following scoring
methods: |
norm.prior |
A numeric vector of length two specifying the mean and
standard deviation of the normal prior distribution. These values are used
to generate the Gaussian quadrature points and weights. Ignored if |
nquad |
An integer indicating the number of Gaussian quadrature points
to be generated from the normal prior distribution. Used only when |
weights |
A two-column matrix or data frame containing the quadrature
points (in the first column) and their corresponding weights (in the second
column) for the latent variable prior distribution. The weights and points
can be conveniently generated using the function If |
ncore |
An integer specifying the number of logical CPU cores to use for
parallel processing. Default is |
verbose |
Logical. If |
... |
Additional arguments passed to the |
Details
In the CATSIB procedure (Nandakumar & Roussos, 2004),
\hat{\theta}^{\ast}
β the expected value of \theta
regressed on
\hat{\theta}
βis a continuous variable. The range of
\hat{\theta}^{\ast}
is divided into K equal-width intervals, and
examinees are classified into one of these K intervals based on their
\hat{\theta}^{\ast}
values. Any interval containing fewer than three
examinees from either the reference or focal group is excluded from the
computation of \hat{\beta}
, the DIF effect size, to ensure statistical
stability. According to Nandakumar and Roussos (2004), the default minimum
bin size is 3, which can be controlled via the min.binsize
argument.
To determine an appropriate number of intervals (K), catsib()
automatically decreases K from a large starting value (e.g., 80) based on
the rule proposed by Nandakumar and Roussos (2004). Specifically, if more
than 7.5\
excluded due to small bin sizes, the number of bins is reduced by one and the
process is repeated. This continues until the retained examinees in each
group comprise at least 92.5\
few bins, they recommended a minimum of K = 10. Therefore, the default
maximum and minimum number of bins are set to 80 and 10, respectively, via
n.bin
. Likewise, the maximum allowable proportion of excluded examinees is
set to 0.075 by default through the max.del
argument.
When it comes to the target ability distribution used to compute
\hat{\beta}
, Li and Stout (1996) and Nandakumar and Roussos (2004)
employed the combined-group target ability distribution, which is the default
option in weight.group
. See Nandakumar and Roussos (2004) for further
details about the CATSIB method.
Although Nandakumar and Roussos (2004) did not propose a purification
procedure for DIF analysis using CATSIB, catsib()
can implement an
iterative purification process in a manner similar to that of Lim et al.
(2022). Specifically, at each iteration, examinees' latent abilities are
recalculated using the purified set of items and the scoring method specified
in the method
argument. The iterative purification process terminates
either when no additional DIF items are detected or when the number of
iterations reaches the limit set by max.iter
. See Lim et al. (2022) for
more details on the purification procedure.
Scoring based on a limited number of items may result in large standard
errors, which can negatively affect the effectiveness of DIF detection using
the CATSIB procedure. The min.resp
argument can be used to prevent the use
of scores with large standard errors, particularly during the purification
process. For example, if min.resp
is not NULL (e.g., min.resp = 5
), item
responses from examinees whose total number of valid responses is below the
specified threshold are treated as missing (i.e., NA). As a result, their
ability estimates are also treated as missing and are excluded from the
CATSIB statistic computation. If min.resp = NULL
, a score will be computed
for any examinee with at least one valid item response.
Value
This function returns a list consisting of four elements:
no_purify |
A list containing the results of the DIF analysis without applying a purification procedure. This list includes:
|
purify |
A logical value indicating whether a purification procedure was applied. |
with_purify |
A list containing the results of the DIF analysis with a purification procedure. This list includes:
|
alpha |
The significance level |
Author(s)
Hwanggyu Lim hglim83@gmail.com
References
Li, H. H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61(4), 647-677.
Lim, H., Choe, E. M., & Han, K. T. (2022). A residual-based differential item functioning detection framework in item response theory. Journal of Educational Measurement.
Nandakumar, R., & Roussos, L. (2004). Evaluation of the CATSIB DIF procedure in a pretest setting. Journal of Educational and Behavioral Statistics, 29(2), 177-199.
Shealy, R. T., & Stout, W. F. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DIF as well as item bias/DIF. Psychometrika, 58, 159β194.
See Also
rdif()
, est_irt, est_item()
,
simdat()
, shape_df()
, est_score()
Examples
# Load required package
library("dplyr")
## Uniform DIF Detection
###############################################
# (1) Simulate data with true uniform DIF
###############################################
# Import the "-prm.txt" output file from flexMIRT
flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ")
# Select 36 3PLM items that are non-DIF
par_nstd <-
bring.flexmirt(file = flex_sam, "par")$Group1$full_df %>%
dplyr::filter(.data$model == "3PLM") %>%
dplyr::filter(dplyr::row_number() %in% 1:36) %>%
dplyr::select(1:6)
par_nstd$id <- paste0("nondif", 1:36)
# Generate four new items to contain uniform DIF
difpar_ref <-
shape_df(
par.drm = list(a = c(0.8, 1.5, 0.8, 1.5), b = c(0.0, 0.0, -0.5, -0.5), g = 0.15),
item.id = paste0("dif", 1:4), cats = 2, model = "3PLM"
)
# Introduce uniform DIF in the focal group by shifting b-parameters
difpar_foc <-
difpar_ref %>%
dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + rep(0.7, 4))
# Combine the 4 DIF and 36 non-DIF items for both reference and focal groups
# Threfore, the first four items now exhibit uniform DIF
par_ref <- rbind(difpar_ref, par_nstd)
par_foc <- rbind(difpar_foc, par_nstd)
# Generate true theta values
set.seed(123)
theta_ref <- rnorm(500, 0.0, 1.0)
theta_foc <- rnorm(500, 0.0, 1.0)
# Simulate response data
resp_ref <- simdat(par_ref, theta = theta_ref, D = 1)
resp_foc <- simdat(par_foc, theta = theta_foc, D = 1)
data <- rbind(resp_ref, resp_foc)
###############################################
# (2) Estimate item and ability parameters
# using the aggregated data
###############################################
# Estimate item parameters
est_mod <- est_irt(data = data, D = 1, model = "3PLM")
est_par <- est_mod$par.est
# Estimate ability parameters using ML
theta_est <- est_score(x = est_par, data = data, method = "ML")
score <- theta_est$est.theta
se <- theta_est$se.theta
###############################################
# (3) Conduct DIF analysis
###############################################
# Create a vector of group membership indicators
# where '1' indicates the focal group
group <- c(rep(0, 500), rep(1, 500))
# (a)-1 Compute the CATSIB statistic using provided scores,
# without purification
dif_1 <- catsib(
x = NULL, data = data, D = 1, score = score, se = se, group = group, focal.name = 1,
weight.group = "comb", alpha = 0.05, missing = NA, purify = FALSE
)
print(dif_1)
# (a)-2 Compute the CATSIB statistic using provided scores,
# with purification
dif_2 <- catsib(
x = est_par, data = data, D = 1, score = score, se = se, group = group, focal.name = 1,
weight.group = "comb", alpha = 0.05, missing = NA, purify = TRUE
)
print(dif_2)