grdif {irtQ} | R Documentation |
Generalized IRT residual-based DIF detection framework for multiple groups (GRDIF)
Description
This function computes three GRDIF statistics, GRDIF_{R}
,
GRDIF_{S}
, and GRDIF_{RS}
, for analyzing differential item
functioning (DIF) among multiple groups (Lim et al., 2024). They are
specialized to capture uniform DIF, nonuniform DIF, and mixed DIF,
respectively.
Usage
grdif(x, ...)
## Default S3 method:
grdif(
x,
data,
score = NULL,
group,
focal.name,
D = 1,
alpha = 0.05,
missing = NA,
purify = FALSE,
purify.by = c("grdifrs", "grdifr", "grdifs"),
max.iter = 10,
min.resp = NULL,
post.hoc = TRUE,
method = "ML",
range = c(-4, 4),
norm.prior = c(0, 1),
nquad = 41,
weights = NULL,
ncore = 1,
verbose = TRUE,
...
)
## S3 method for class 'est_irt'
grdif(
x,
score = NULL,
group,
focal.name,
alpha = 0.05,
missing = NA,
purify = FALSE,
purify.by = c("grdifrs", "grdifr", "grdifs"),
max.iter = 10,
min.resp = NULL,
post.hoc = TRUE,
method = "ML",
range = c(-4, 4),
norm.prior = c(0, 1),
nquad = 41,
weights = NULL,
ncore = 1,
verbose = TRUE,
...
)
## S3 method for class 'est_item'
grdif(
x,
group,
focal.name,
alpha = 0.05,
missing = NA,
purify = FALSE,
purify.by = c("grdifrs", "grdifr", "grdifs"),
max.iter = 10,
min.resp = NULL,
post.hoc = TRUE,
method = "ML",
range = c(-4, 4),
norm.prior = c(0, 1),
nquad = 41,
weights = NULL,
ncore = 1,
verbose = TRUE,
...
)
Arguments
x |
A data frame containing item metadata (e.g., item parameters,
number of categories, IRT model types, etc.); or an object of class
See |
... |
Additional arguments passed to the |
data |
A matrix of examinees' item responses corresponding to the items
specified in the |
score |
A numeric vector containing examinees' ability estimates (theta
values). If not provided, |
group |
A numeric or character vector indicating examinees' group membership. The length of the vector must match the number of rows in the response data matrix. |
focal.name |
A numeric or character vector specifying the levels
associated with the focal groups. For example, consider |
D |
A scaling constant used in IRT models to make the logistic function closely approximate the normal ogive function. A value of 1.7 is commonly used for this purpose. Default is 1. |
alpha |
A numeric value specifying the significance level ( |
missing |
A value indicating missing responses in the data set. Default
is |
purify |
Logical. Indicates whether to apply a purification procedure.
Default is |
purify.by |
A character string specifying which GRDIF statistic is used
to perform the purification. Available options are "grdifrs" for
|
max.iter |
A positive integer specifying the maximum number of
iterations allowed for the purification process. Default is |
min.resp |
A positive integer specifying the minimum number of valid
item responses required from an examinee in order to compute an ability
estimate. Default is |
post.hoc |
A logical value indicating whether to perform post-hoc RDIF analyses for all possible pairwise group comparisons on items flagged as statistically significant. Default is TRUE. See details below. |
method |
A character string indicating the scoring method to use. Available options are:
Default is |
range |
A numeric vector of length two specifying the lower and upper
bounds of the ability scale. This is used for the following scoring
methods: |
norm.prior |
A numeric vector of length two specifying the mean and
standard deviation of the normal prior distribution. These values are used
to generate the Gaussian quadrature points and weights. Ignored if |
nquad |
An integer indicating the number of Gaussian quadrature points
to be generated from the normal prior distribution. Used only when |
weights |
A two-column matrix or data frame containing the quadrature
points (in the first column) and their corresponding weights (in the second
column) for the latent variable prior distribution. The weights and points
can be conveniently generated using the function If |
ncore |
An integer specifying the number of logical CPU cores to use for
parallel processing. Default is |
verbose |
Logical. If |
Details
The GRDIF framework (Lim et al., 2024) is a generalized version of the RDIF
detection framework, designed to assess DIF across multiple groups. The
framework includes three statistics: GRDIF_{R}
, GRDIF_{S}
, and
GRDIF_{RS}
, which are tailored to detect uniform, nonuniform, and mixed
DIF, respectively. Under the null hypothesis that the test contains no DIF
items, the statistics GRDIF_{R}
, GRDIF_{S}
, and GRDIF_{RS}
asymptotically follow \chi^{2}
distributions with G-1, G-1, and 2(G-1)
degrees of freedom, respectively, where G represents the number of
groups being compared. For more information on the GRDIF framework, see
Lim et al. (2024).
The grdif()
function computes all three GRDIF statistics:
GRDIF_{R}
, GRDIF_{S}
, and GRDIF_{RS}
. It supports both
dichotomous and polytomous item response data. To compute these statistics,
grdif()
requires: (1) item parameter estimates obtained from aggregate data
(regardless of group membership); (2) examinees' ability estimates (e.g., ML);
and (3) their item response data. Note that ability estimates must be computed
using the item parameters estimated from the aggregate data. The item parameters
should be provided via the x
argument, the ability estimates via the score
argument, and the response data via the data
argument. If ability estimates
are not supplied (score = NULL
), grdif()
automatically computes them
using the scoring method specified in the method
argument (e.g., method = "ML"
).
The group
argument accepts a vector of numeric or character values,
indicating the group membership of examinees. The vector may contain multiple
distinct values, where one represents the reference group and the others
represent focal groups. Its length must match the number of rows in the
response data, with each value corresponding to an examinee’s group
membership. Once group
is specified, a numeric or character vector must be
supplied via the focal.name
argument to define which group(s) in group
represent the focal groups. The reference group is defined as the group not
included in focal.name
.
Similar to the original RDIF framework for two-group comparisons, the GRDIF
framework supports an iterative purification process. When purify = TRUE
,
purification is conducted based on the GRDIF statistic specified in the
purify.by
argument (e.g., purify.by = "grdifrs"
). During each iteration,
examinees' latent abilities are re-estimated using only the purified items,
with the scoring method determined by the method
argument. The process
continues until no additional DIF items are flagged or until the number of
iterations reaches the specified max.iter
limit. For details on the
purification procedure, see Lim et al. (2022).
Scoring based on a limited number of items can lead to large standard errors,
which may compromise the effectiveness of DIF detection within the GRDIF
framework. The min.resp
argument can be used to exclude ability estimates
with substantial standard errors, especially during the purification process.
For example, if min.resp
is not NULL (e.g., min.resp = 5
), examinees whose
total number of responses falls below the specified threshold will have their
responses treated as missing values (i.e., NA). Consequently, their ability
estimates will also be missing and will not be used when computing the GRDIF
statistics. If min.resp = NULL
, an examinee's score will be computed as long
as at least one response is available.
The post.hoc
argument enables post-hoc RDIF analyses across all possible
pairwise group comparisons for items flagged as statistically significant.
For instance, consider four groups of examinees: A, B, C, and D. If
post.hoc = TRUE
, the grdif()
function will perform pairwise RDIF analyses
for each flagged item across all group pairs (A-B, A-C, A-D, B-C, B-D, and C-D).
This provides a more detailed understanding of which specific group pairs
exhibit DIF. Note that when purification is enabled (i.e., purify = TRUE
),
post-hoc RDIF analyses are conducted for each flagged item at each iteration
of the purification process.
Value
This function returns a list containing four main components:
no_purify |
A list of sub-objects presenting the results of DIF analysis without a purification procedure. These include:
|
purify |
A logical value indicating whether the purification process was applied. |
with_purify |
A list of sub-objects presenting the results of DIF analysis with a purification procedure. These include:
|
alpha |
The significance level ( |
Methods (by class)
-
grdif(default)
: Default method to compute the three GRDIF statistics for multiple-group data using a data framex
that contains item metadata. -
grdif(est_irt)
: An object created by the functionest_irt()
. -
grdif(est_item)
: An object created by the functionest_item()
.
Author(s)
Hwanggyu Lim hglim83@gmail.com
References
Lim, H., & Choe, E. M. (2023). Detecting differential item functioning in CAT using IRT residual DIF approach. Journal of Educational Measurement, 60(4), 626-650. doi:10.1111/jedm.12366.
Lim, H., Choe, E. M., & Han, K. T. (2022). A residual-based differential item functioning detection framework in item response theory. Journal of Educational Measurement, 59(1), 80-104. doi:10.1111/jedm.12313.
Lim, H., Zhu, D., Choe, E. M., & Han, K. T. (2024). Detecting differential item functioning among multiple groups using IRT residual DIF framework. Journal of Educational Measurement, 61(4), 656-681.
See Also
rdif()
est_irt()
, est_item()
,
simdat()
, shape_df()
, est_score()
Examples
# Load required library
library("dplyr")
## Uniform DIF detection for four groups (1 reference, 3 focal)
########################################################
# (1) Manipulate uniform DIF for all three focal groups
########################################################
# Import the "-prm.txt" output file from flexMIRT
flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ")
# Select 36 non-DIF items modeled under 3PLM
par_nstd <-
bring.flexmirt(file = flex_sam, "par")$Group1$full_df %>%
dplyr::filter(.data$model == "3PLM") %>%
dplyr::filter(dplyr::row_number() %in% 1:36) %>%
dplyr::select(1:6)
par_nstd$id <- paste0("nondif", 1:36)
# Generate four new items on which uniform DIF will be imposed
difpar_ref <-
shape_df(
par.drm = list(a = c(0.8, 1.5, 0.8, 1.5), b = c(0.0, 0.0, -0.5, -0.5), g = .15),
item.id = paste0("dif", 1:4), cats = 2, model = "3PLM"
)
# Introduce DIF by shifting b-parameters differently for each focal group
difpar_foc1 <-
difpar_ref %>%
dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + c(0.7, 0.7, 0, 0))
difpar_foc2 <-
difpar_ref %>%
dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + c(0, 0, 0.7, 0.7))
difpar_foc3 <-
difpar_ref %>%
dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + c(-0.4, -0.4, -0.5, -0.5))
# Combine the 4 DIF and 36 non-DIF items for all four groups
# Therefore, the first four items contain uniform DIF across all focal groups
par_ref <- rbind(difpar_ref, par_nstd)
par_foc1 <- rbind(difpar_foc1, par_nstd)
par_foc2 <- rbind(difpar_foc2, par_nstd)
par_foc3 <- rbind(difpar_foc3, par_nstd)
# Generate true abilities from different distributions
set.seed(128)
theta_ref <- rnorm(500, 0.0, 1.0)
theta_foc1 <- rnorm(500, -1.0, 1.0)
theta_foc2 <- rnorm(500, 1.0, 1.0)
theta_foc3 <- rnorm(500, 0.5, 1.0)
# Simulate response data for each group
resp_ref <- irtQ::simdat(par_ref, theta = theta_ref, D = 1)
resp_foc1 <- irtQ::simdat(par_foc1, theta = theta_foc1, D = 1)
resp_foc2 <- irtQ::simdat(par_foc2, theta = theta_foc2, D = 1)
resp_foc3 <- irtQ::simdat(par_foc3, theta = theta_foc3, D = 1)
data <- rbind(resp_ref, resp_foc1, resp_foc2, resp_foc3)
########################################################
# (2) Estimate item and ability parameters
# using aggregated data
########################################################
# Estimate item parameters
est_mod <- irtQ::est_irt(data = data, D = 1, model = "3PLM")
est_par <- est_mod$par.est
# Estimate ability parameters using MLE
score <- irtQ::est_score(x = est_par, data = data, method = "ML")$est.theta
########################################################
# (3) Conduct DIF analysis
########################################################
# Create a group membership vector:
# 0 = reference group; 1, 2, 3 = focal groups
group <- c(rep(0, 500), rep(1, 500), rep(2, 500), rep(3, 500))
# (a) Compute GRDIF statistics without purification,
# and perform post-hoc pairwise comparisons for flagged items
dif_nopuri <- grdif(
x = est_par, data = data, score = score, group = group,
focal.name = c(1, 2, 3), D = 1, alpha = 0.05,
purify = FALSE, post.hoc = TRUE
)
print(dif_nopuri)
# Display post-hoc pairwise comparison results
print(dif_nopuri$no_purify$post.hoc)
# (b) Compute GRDIF statistics with purification
# based on GRDIF_R, including post-hoc comparisons
dif_puri_r <- grdif(
x = est_par, data = data, score = score, group = group,
focal.name = c(1, 2, 3), D = 1, alpha = 0.05,
purify = TRUE, purify.by = "grdifr", post.hoc = TRUE
)
print(dif_puri_r)
# Display post-hoc results before purification
print(dif_puri_r$no_purify$post.hoc)
# Display post-hoc results after purification
print(dif_puri_r$with_purify$post.hoc)