grdif {irtQ}R Documentation

Generalized IRT residual-based DIF detection framework for multiple groups (GRDIF)

Description

This function computes three GRDIF statistics, GRDIF_{R}, GRDIF_{S}, and GRDIF_{RS}, for analyzing differential item functioning (DIF) among multiple groups (Lim et al., 2024). They are specialized to capture uniform DIF, nonuniform DIF, and mixed DIF, respectively.

Usage

grdif(x, ...)

## Default S3 method:
grdif(
  x,
  data,
  score = NULL,
  group,
  focal.name,
  D = 1,
  alpha = 0.05,
  missing = NA,
  purify = FALSE,
  purify.by = c("grdifrs", "grdifr", "grdifs"),
  max.iter = 10,
  min.resp = NULL,
  post.hoc = TRUE,
  method = "ML",
  range = c(-4, 4),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  ncore = 1,
  verbose = TRUE,
  ...
)

## S3 method for class 'est_irt'
grdif(
  x,
  score = NULL,
  group,
  focal.name,
  alpha = 0.05,
  missing = NA,
  purify = FALSE,
  purify.by = c("grdifrs", "grdifr", "grdifs"),
  max.iter = 10,
  min.resp = NULL,
  post.hoc = TRUE,
  method = "ML",
  range = c(-4, 4),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  ncore = 1,
  verbose = TRUE,
  ...
)

## S3 method for class 'est_item'
grdif(
  x,
  group,
  focal.name,
  alpha = 0.05,
  missing = NA,
  purify = FALSE,
  purify.by = c("grdifrs", "grdifr", "grdifs"),
  max.iter = 10,
  min.resp = NULL,
  post.hoc = TRUE,
  method = "ML",
  range = c(-4, 4),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  ncore = 1,
  verbose = TRUE,
  ...
)

Arguments

x

A data frame containing item metadata (e.g., item parameters, number of categories, IRT model types, etc.); or an object of class est_irt obtained from est_irt(), or est_item from est_item().

See est_irt() or simdat() for more details about the item metadata. This data frame can be easily created using the shape_df() function.

...

Additional arguments passed to the est_score() function.

data

A matrix of examinees' item responses corresponding to the items specified in the x argument. Rows represent examinees and columns represent items.

score

A numeric vector containing examinees' ability estimates (theta values). If not provided, grdif() will estimate ability parameters internally before computing the GRDIF statistics. See est_score() for more information on scoring methods. Default is NULL.

group

A numeric or character vector indicating examinees' group membership. The length of the vector must match the number of rows in the response data matrix.

focal.name

A numeric or character vector specifying the levels associated with the focal groups. For example, consider group = c(0, 0, 1, 2, 2, 3, 3), where '1', '2', and '3' indicate three distinct focal groups and '0' represents the reference group. In this case, set focal.name = c(1, 2, 3).

D

A scaling constant used in IRT models to make the logistic function closely approximate the normal ogive function. A value of 1.7 is commonly used for this purpose. Default is 1.

alpha

A numeric value specifying the significance level (\alpha) for hypothesis testing using the GRDIF statistics. Default is 0.05.

missing

A value indicating missing responses in the data set. Default is NA.

purify

Logical. Indicates whether to apply a purification procedure. Default is FALSE.

purify.by

A character string specifying which GRDIF statistic is used to perform the purification. Available options are "grdifrs" for GRDIF_{RS}, "grdifr" for GRDIF_{R}, and "grdifs" for GRDIF_{S}.

max.iter

A positive integer specifying the maximum number of iterations allowed for the purification process. Default is 10.

min.resp

A positive integer specifying the minimum number of valid item responses required from an examinee in order to compute an ability estimate. Default is NULL. See Details for more information.

post.hoc

A logical value indicating whether to perform post-hoc RDIF analyses for all possible pairwise group comparisons on items flagged as statistically significant. Default is TRUE. See details below.

method

A character string indicating the scoring method to use. Available options are:

  • "ML": Maximum likelihood estimation

  • "WL": Weighted likelihood estimation (Warm, 1989)

  • "MAP": Maximum a posteriori estimation (Hambleton et al., 1991)

  • "EAP": Expected a posteriori estimation (Bock & Mislevy, 1982)

Default is "ML".

range

A numeric vector of length two specifying the lower and upper bounds of the ability scale. This is used for the following scoring methods: "ML", "WL", and "MAP". Default is c(-5, 5).

norm.prior

A numeric vector of length two specifying the mean and standard deviation of the normal prior distribution. These values are used to generate the Gaussian quadrature points and weights. Ignored if method is "ML" or "WL". Default is c(0, 1).

nquad

An integer indicating the number of Gaussian quadrature points to be generated from the normal prior distribution. Used only when method is "EAP". Ignored for "ML", "WL", and "MAP". Default is 41.

weights

A two-column matrix or data frame containing the quadrature points (in the first column) and their corresponding weights (in the second column) for the latent variable prior distribution. The weights and points can be conveniently generated using the function gen.weight().

If NULL and method = "EAP", default quadrature values are generated based on the norm.prior and nquad arguments. Ignored if method is "ML", "WL", or "MAP".

ncore

An integer specifying the number of logical CPU cores to use for parallel processing. Default is 1. See est_score() for details.

verbose

Logical. If TRUE, progress messages from the purification procedure will be displayed; if FALSE, the messages will be suppressed. Default is TRUE.

Details

The GRDIF framework (Lim et al., 2024) is a generalized version of the RDIF detection framework, designed to assess DIF across multiple groups. The framework includes three statistics: GRDIF_{R}, GRDIF_{S}, and GRDIF_{RS}, which are tailored to detect uniform, nonuniform, and mixed DIF, respectively. Under the null hypothesis that the test contains no DIF items, the statistics GRDIF_{R}, GRDIF_{S}, and GRDIF_{RS} asymptotically follow \chi^{2} distributions with G-1, G-1, and 2(G-1) degrees of freedom, respectively, where G represents the number of groups being compared. For more information on the GRDIF framework, see Lim et al. (2024).

The grdif() function computes all three GRDIF statistics: GRDIF_{R}, GRDIF_{S}, and GRDIF_{RS}. It supports both dichotomous and polytomous item response data. To compute these statistics, grdif() requires: (1) item parameter estimates obtained from aggregate data (regardless of group membership); (2) examinees' ability estimates (e.g., ML); and (3) their item response data. Note that ability estimates must be computed using the item parameters estimated from the aggregate data. The item parameters should be provided via the x argument, the ability estimates via the score argument, and the response data via the data argument. If ability estimates are not supplied (score = NULL), grdif() automatically computes them using the scoring method specified in the method argument (e.g., method = "ML").

The group argument accepts a vector of numeric or character values, indicating the group membership of examinees. The vector may contain multiple distinct values, where one represents the reference group and the others represent focal groups. Its length must match the number of rows in the response data, with each value corresponding to an examinee’s group membership. Once group is specified, a numeric or character vector must be supplied via the focal.name argument to define which group(s) in group represent the focal groups. The reference group is defined as the group not included in focal.name.

Similar to the original RDIF framework for two-group comparisons, the GRDIF framework supports an iterative purification process. When purify = TRUE, purification is conducted based on the GRDIF statistic specified in the purify.by argument (e.g., purify.by = "grdifrs"). During each iteration, examinees' latent abilities are re-estimated using only the purified items, with the scoring method determined by the method argument. The process continues until no additional DIF items are flagged or until the number of iterations reaches the specified max.iter limit. For details on the purification procedure, see Lim et al. (2022).

Scoring based on a limited number of items can lead to large standard errors, which may compromise the effectiveness of DIF detection within the GRDIF framework. The min.resp argument can be used to exclude ability estimates with substantial standard errors, especially during the purification process. For example, if min.resp is not NULL (e.g., min.resp = 5), examinees whose total number of responses falls below the specified threshold will have their responses treated as missing values (i.e., NA). Consequently, their ability estimates will also be missing and will not be used when computing the GRDIF statistics. If min.resp = NULL, an examinee's score will be computed as long as at least one response is available.

The post.hoc argument enables post-hoc RDIF analyses across all possible pairwise group comparisons for items flagged as statistically significant. For instance, consider four groups of examinees: A, B, C, and D. If post.hoc = TRUE, the grdif() function will perform pairwise RDIF analyses for each flagged item across all group pairs (A-B, A-C, A-D, B-C, B-D, and C-D). This provides a more detailed understanding of which specific group pairs exhibit DIF. Note that when purification is enabled (i.e., purify = TRUE), post-hoc RDIF analyses are conducted for each flagged item at each iteration of the purification process.

Value

This function returns a list containing four main components:

no_purify

A list of sub-objects presenting the results of DIF analysis without a purification procedure. These include:

dif_stat

A data frame summarizing the results of the three GRDIF statistics for all evaluated items. Columns include item ID, GRDIF_{R}, GRDIF_{S}, and GRDIF_{RS} statistics; their corresponding p-values; the sample size of the reference group; the sample sizes of the focal groups; and the total sample size.

moments

A list of three data frames reporting the moments of mean raw residuals (MRRs) and mean squared residuals (MSRs) across all compared groups. The first contains means, the second variances, and the third covariances of MRRs and MSRs.

dif_item

A list of three numeric vectors indicating the items flagged as potential DIF items by each of the GRDIF statistics (GRDIF_{R}, GRDIF_{S}, and GRDIF_{RS}).

score

A numeric vector of ability estimates used to compute the GRDIF statistics.

post.hoc

A list of three data frames containing post-hoc RDIF analysis results for all possible pairwise group comparisons. Each data frame corresponds to the results for items flagged by GRDIF_{R}, GRDIF_{S}, and GRDIF_{RS}, respectively.

purify

A logical value indicating whether the purification process was applied.

with_purify

A list of sub-objects presenting the results of DIF analysis with a purification procedure. These include:

purify.by

A character string indicating which GRDIF statistic was used for purification: "grdifr", "grdifs", or "grdifrs", corresponding to GRDIF_{R}, GRDIF_{S}, and GRDIF_{RS}, respectively.

dif_stat

A data frame summarizing the GRDIF results across iterations. Columns include item ID, the three GRDIF statistics and their p-values, sample size of the reference group, sample sizes of the focal groups, total sample size, and the iteration number at which each statistic was computed.

moments

A list of three data frames showing the MRR and MSR moments across iterations. The final column in each data frame indicates the iteration in which the statistics were computed.

n.iter

The total number of iterations executed during the purification process.

score

A numeric vector of the final purified ability estimates used for computing GRDIF statistics.

post.hoc

A data frame containing the post-hoc RDIF analysis results for flagged items across all possible pairwise group comparisons, updated at each iteration.

complete

A logical value indicating whether the purification process was completed. If FALSE, the process reached the maximum number of iterations without convergence.

alpha

The significance level (\alpha) used for hypothesis testing of the GRDIF statistics.

Methods (by class)

Author(s)

Hwanggyu Lim hglim83@gmail.com

References

Lim, H., & Choe, E. M. (2023). Detecting differential item functioning in CAT using IRT residual DIF approach. Journal of Educational Measurement, 60(4), 626-650. doi:10.1111/jedm.12366.

Lim, H., Choe, E. M., & Han, K. T. (2022). A residual-based differential item functioning detection framework in item response theory. Journal of Educational Measurement, 59(1), 80-104. doi:10.1111/jedm.12313.

Lim, H., Zhu, D., Choe, E. M., & Han, K. T. (2024). Detecting differential item functioning among multiple groups using IRT residual DIF framework. Journal of Educational Measurement, 61(4), 656-681.

See Also

rdif() est_irt(), est_item(), simdat(), shape_df(), est_score()

Examples


# Load required library
library("dplyr")

## Uniform DIF detection for four groups (1 reference, 3 focal)
########################################################
# (1) Manipulate uniform DIF for all three focal groups
########################################################

# Import the "-prm.txt" output file from flexMIRT
flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ")

# Select 36 non-DIF items modeled under 3PLM
par_nstd <-
  bring.flexmirt(file = flex_sam, "par")$Group1$full_df %>%
  dplyr::filter(.data$model == "3PLM") %>%
  dplyr::filter(dplyr::row_number() %in% 1:36) %>%
  dplyr::select(1:6)
par_nstd$id <- paste0("nondif", 1:36)

# Generate four new items on which uniform DIF will be imposed
difpar_ref <-
  shape_df(
    par.drm = list(a = c(0.8, 1.5, 0.8, 1.5), b = c(0.0, 0.0, -0.5, -0.5), g = .15),
    item.id = paste0("dif", 1:4), cats = 2, model = "3PLM"
  )

# Introduce DIF by shifting b-parameters differently for each focal group
difpar_foc1 <-
  difpar_ref %>%
  dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + c(0.7, 0.7, 0, 0))
difpar_foc2 <-
  difpar_ref %>%
  dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + c(0, 0, 0.7, 0.7))
difpar_foc3 <-
  difpar_ref %>%
  dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + c(-0.4, -0.4, -0.5, -0.5))

# Combine the 4 DIF and 36 non-DIF items for all four groups
# Therefore, the first four items contain uniform DIF across all focal groups
par_ref <- rbind(difpar_ref, par_nstd)
par_foc1 <- rbind(difpar_foc1, par_nstd)
par_foc2 <- rbind(difpar_foc2, par_nstd)
par_foc3 <- rbind(difpar_foc3, par_nstd)

# Generate true abilities from different distributions
set.seed(128)
theta_ref <- rnorm(500, 0.0, 1.0)
theta_foc1 <- rnorm(500, -1.0, 1.0)
theta_foc2 <- rnorm(500, 1.0, 1.0)
theta_foc3 <- rnorm(500, 0.5, 1.0)

# Simulate response data for each group
resp_ref <- irtQ::simdat(par_ref, theta = theta_ref, D = 1)
resp_foc1 <- irtQ::simdat(par_foc1, theta = theta_foc1, D = 1)
resp_foc2 <- irtQ::simdat(par_foc2, theta = theta_foc2, D = 1)
resp_foc3 <- irtQ::simdat(par_foc3, theta = theta_foc3, D = 1)
data <- rbind(resp_ref, resp_foc1, resp_foc2, resp_foc3)

########################################################
# (2) Estimate item and ability parameters
#     using aggregated data
########################################################

# Estimate item parameters
est_mod <- irtQ::est_irt(data = data, D = 1, model = "3PLM")
est_par <- est_mod$par.est

# Estimate ability parameters using MLE
score <- irtQ::est_score(x = est_par, data = data, method = "ML")$est.theta

########################################################
# (3) Conduct DIF analysis
########################################################

# Create a group membership vector:
# 0 = reference group; 1, 2, 3 = focal groups
group <- c(rep(0, 500), rep(1, 500), rep(2, 500), rep(3, 500))

# (a) Compute GRDIF statistics without purification,
#     and perform post-hoc pairwise comparisons for flagged items
dif_nopuri <- grdif(
  x = est_par, data = data, score = score, group = group,
  focal.name = c(1, 2, 3), D = 1, alpha = 0.05,
  purify = FALSE, post.hoc = TRUE
)
print(dif_nopuri)

# Display post-hoc pairwise comparison results
print(dif_nopuri$no_purify$post.hoc)

# (b) Compute GRDIF statistics with purification
#     based on GRDIF_R, including post-hoc comparisons
dif_puri_r <- grdif(
  x = est_par, data = data, score = score, group = group,
  focal.name = c(1, 2, 3), D = 1, alpha = 0.05,
  purify = TRUE, purify.by = "grdifr", post.hoc = TRUE
)
print(dif_puri_r)

# Display post-hoc results before purification
print(dif_puri_r$no_purify$post.hoc)

# Display post-hoc results after purification
print(dif_puri_r$with_purify$post.hoc)



[Package irtQ version 1.0.0 Index]