rdif {irtQ}R Documentation

IRT Residual-Based Differential Item Functioning (RDIF) Detection Framework

Description

This function computes three RDIF statistics for each item: RDIF_{R}, RDIF_{S}, and RDIF_{RS} (Lim & Choe, 2023; Lim, et al., 2022). RDIF_{R} primarily captures differences in raw residuals between two groups, which are typically associated with uniform DIF. RDIF_{S} primarily captures differences in squared residuals, which are typically associated with nonuniform DIF. RDIF_{RS} jointly considers both types of differences and is capable of detecting both uniform and nonuniform DIF.

Usage

rdif(x, ...)

## Default S3 method:
rdif(
  x,
  data,
  score = NULL,
  group,
  focal.name,
  item.skip = NULL,
  D = 1,
  alpha = 0.05,
  missing = NA,
  purify = FALSE,
  purify.by = c("rdifrs", "rdifr", "rdifs"),
  max.iter = 10,
  min.resp = NULL,
  method = "ML",
  range = c(-5, 5),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  ncore = 1,
  verbose = TRUE,
  ...
)

## S3 method for class 'est_irt'
rdif(
  x,
  score = NULL,
  group,
  focal.name,
  item.skip = NULL,
  alpha = 0.05,
  missing = NA,
  purify = FALSE,
  purify.by = c("rdifrs", "rdifr", "rdifs"),
  max.iter = 10,
  min.resp = NULL,
  method = "ML",
  range = c(-5, 5),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  ncore = 1,
  verbose = TRUE,
  ...
)

## S3 method for class 'est_item'
rdif(
  x,
  group,
  focal.name,
  item.skip = NULL,
  alpha = 0.05,
  missing = NA,
  purify = FALSE,
  purify.by = c("rdifrs", "rdifr", "rdifs"),
  max.iter = 10,
  min.resp = NULL,
  method = "ML",
  range = c(-5, 5),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  ncore = 1,
  verbose = TRUE,
  ...
)

Arguments

x

A data frame containing item metadata (e.g., item parameters, number of categories, IRT model types, etc.); or an object of class est_irt obtained from est_irt(), or est_item from est_item().

See est_irt() or simdat() for more details about the item metadata. This data frame can be easily created using the shape_df() function.

...

Additional arguments passed to the est_score() function.

data

A matrix of examinees' item responses corresponding to the items specified in the x argument. Rows represent examinees and columns represent items.

score

A numeric vector containing examinees' ability estimates (theta values). If not provided, rdif() will estimate ability parameters internally before computing the RDIF statistics. See est_score() for more information on scoring methods. Default is NULL.

group

A numeric or character vector indicating examinees' group membership. The length of the vector must match the number of rows in the response data matrix.

focal.name

A single numeric or character value specifying the focal group. For instance, given group = c(0, 1, 0, 1, 1) and '1' indicating the focal group, set focal.name = 1.

item.skip

A numeric vector of item indices to exclude from DIF analysis. If NULL, all items are included. Useful for omitting specific items based on prior insights.

D

A scaling constant used in IRT models to make the logistic function closely approximate the normal ogive function. A value of 1.7 is commonly used for this purpose. Default is 1.

alpha

A numeric value specifying the significance level (\alpha) for hypothesis testing using the RDIF statistics. Default is 0.05.

missing

A value indicating missing responses in the data set. Default is NA.

purify

Logical. Indicates whether to apply a purification procedure. Default is FALSE.

purify.by

A character string specifying which RDIF statistic is used to perform the purification. Available options are "rdifrs" for RDIF_{RS}, "rdifr" for RDIF_{R}, and "rdifs" for RDIF_{S}.

max.iter

A positive integer specifying the maximum number of iterations allowed for the purification process. Default is 10.

min.resp

A positive integer specifying the minimum number of valid item responses required from an examinee in order to compute an ability estimate. Default is NULL. See Details for more information.

method

A character string indicating the scoring method to use. Available options are:

  • "ML": Maximum likelihood estimation

  • "WL": Weighted likelihood estimation (Warm, 1989)

  • "MAP": Maximum a posteriori estimation (Hambleton et al., 1991)

  • "EAP": Expected a posteriori estimation (Bock & Mislevy, 1982)

Default is "ML".

range

A numeric vector of length two specifying the lower and upper bounds of the ability scale. This is used for the following scoring methods: "ML", "WL", and "MAP". Default is c(-5, 5).

norm.prior

A numeric vector of length two specifying the mean and standard deviation of the normal prior distribution. These values are used to generate the Gaussian quadrature points and weights. Ignored if method is "ML" or "WL". Default is c(0, 1).

nquad

An integer indicating the number of Gaussian quadrature points to be generated from the normal prior distribution. Used only when method is "EAP". Ignored for "ML", "WL", and "MAP". Default is 41.

weights

A two-column matrix or data frame containing the quadrature points (in the first column) and their corresponding weights (in the second column) for the latent variable prior distribution. The weights and points can be conveniently generated using the function gen.weight().

If NULL and method = "EAP", default quadrature values are generated based on the norm.prior and nquad arguments. Ignored if method is "ML", "WL", or "MAP".

ncore

An integer specifying the number of logical CPU cores to use for parallel processing. Default is 1. See est_score() for details.

verbose

Logical. If TRUE, progress messages from the purification procedure will be displayed; if FALSE, the messages will be suppressed. Default is TRUE.

Details

The RDIF framework (Lim & Choe, 2023; Lim et al., 2022) consists of three IRT residual-based statistics: RDIF_{R}, RDIF_{S}, and RDIF_{RS}. Under the null hypothesis that a test contains no DIF items, RDIF_{R} and RDIF_{S} asymptotically follow standard normal distributions. RDIF_{RS} is based on a bivariate normal distribution of the RDIF_{R} and RDIF_{S} statistics, and under the null hypothesis, it asymptotically follows a \chi^{2} distribution with 2 degrees of freedom. See Lim et al. (2022) for more details about the RDIF framework.

The rdif() function computes all three RDIF statistics: RDIF_{R}, RDIF_{S}, and RDIF_{RS}. The current version of rdif() supports both dichotomous and polytomous item response data. Note that for polytomous items, net DIF are assessed. To evaluate global DIF for polytomous items, use crdif() function.

To compute the RDIF statistics, the rdif() function requires: (1) item parameter estimates obtained from aggregate data (regardless of group membership), (2) examinees' ability estimates (e.g., ML), and (3) examinees' item response data. Note that the ability estimates must be based on the aggregate-data item parameters. The item parameter estimates should be provided in the x argument, the ability estimates in the score argument, and the response data in the data argument. If ability estimates are not provided (i.e., score = NULL), rdif() will estimate them automatically using the scoring method specified via the method argument (e.g., method = "ML").

The group argument should be a vector containing exactly two distinct values (either numeric or character), representing the reference and focal groups. Its length must match the number of rows in the response data, where each element corresponds to an examinee. Once group is specified, a single numeric or character value must be provided in the focal.name argument to indicate which level in group represents the focal group.

Similar to other DIF detection approaches, the RDIF framework supports an iterative purification process. When purify = TRUE, purification is conducted using one of the RDIF statistics specified in the purify.by argument (e.g., purify.by = "rdifrs"). At each iteration, examinees' ability estimates are recalculated based on the set of purified items using the scoring method specified in the method argument. The purification process continues until no additional DIF items are identified or the maximum number of iterations specified in max.iter is reached. See Lim et al. (2022) for more details on the purification procedure.

Scoring based on a small number of item responses can lead to large standard errors, potentially reducing the accuracy of DIF detection in the RDIF framework. The min.resp argument can be used to exclude examinees with insufficient response data from scoring, especially during the purification process. For example, if min.resp is not NULL (e.g., min.resp = 5), examinees who responded to fewer than five items will have all their responses treated as missing (i.e., NA). As a result, their ability estimates will also be missing and will not be used in the computation of RDIF statistics. If min.resp = NULL, a score will be computed for any examinee with at least one valid item response.

Value

This function returns a list containing four main components:

no_purify

A list of sub-objects containing the results of DIF analysis without applying a purification procedure. The sub-objects include:

dif_stat

A data frame summarizing the RDIF analysis results for all items. The columns include: item ID, RDIF_{R} statistic, standardized RDIF_{R}, RDIF_{S} statistic, standardized RDIF_{S}, RDIF_{RS} statistic, p-values for RDIF_{R}, RDIF_{S}, and RDIF_{RS}, sample sizes for the reference and focal groups, and total sample size. Note that RDIF_{RS} does not have a standardized value because it is a \chi^{2}-based statistic.

moments

A data frame reporting the first and second moments of the RDIF statistics. The columns include: item ID, mean and standard deviation of RDIF_{R}, mean and standard deviation of RDIF_{S}, and the covariance between RDIF_{R} and RDIF_{S}.

dif_item

A list of three numeric vectors identifying items flagged as DIF by each RDIF statistic: RDIF_{R}, RDIF_{S}, and RDIF_{RS}.

score

A numeric vector of ability estimates used to compute the RDIF statistics.

purify

A logical value indicating whether the purification procedure was applied.

with_purify

A list of sub-objects containing the results of DIF analysis with a purification procedure. The sub-objects include:

purify.by

A character string indicating the RDIF statistic used for purification. Possible values are "rdifr", "rdifs", and "rdifrs", corresponding to RDIF_{R}, RDIF_{S}, and RDIF_{RS}, respectively.

dif_stat

A data frame reporting the RDIF analysis results for all items across the final iteration. Same structure as in no_purify, with one additional column indicating the iteration number in which each result was obtained.

moments

A data frame reporting the moments of RDIF statistics across the final iteration. Includes the same columns as in no_purify, with an additional column for the iteration number.

dif_item

A list of three numeric vectors identifying DIF items flagged by each RDIF statistic.

n.iter

An integer indicating the total number of iterations performed during the purification process.

score

A numeric vector of purified ability estimates used to compute the final RDIF statistics.

complete

A logical value indicating whether the purification process converged. If FALSE, the maximum number of iterations was reached without convergence.

alpha

A numeric value indicating the significance level (\alpha) used in hypothesis testing for RDIF statistics.

Methods (by class)

Author(s)

Hwanggyu Lim hglim83@gmail.com

References

Lim, H., & Choe, E. M. (2023). Detecting differential item functioning in CAT using IRT residual DIF approach. Journal of Educational Measurement, 60(4), 626-650. doi:10.1111/jedm.12366.

Lim, H., Choe, E. M., & Han, K. T. (2022). A residual-based differential item functioning detection framework in item response theory. Journal of Educational Measurement, 59(1), 80-104. doi:10.1111/jedm.12313.

See Also

est_irt(), est_item(), simdat(), shape_df(), est_score()

Examples


# Load required package
library("dplyr")

## Uniform DIF detection
###############################################
# (1) Generate data with known uniform DIF
###############################################

# Import the "-prm.txt" output file from flexMIRT
flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ")

# Select 36 non-DIF items using the 3PLM model
par_nstd <-
  bring.flexmirt(file = flex_sam, "par")$Group1$full_df %>%
  dplyr::filter(.data$model == "3PLM") %>%
  dplyr::filter(dplyr::row_number() %in% 1:36) %>%
  dplyr::select(1:6)
par_nstd$id <- paste0("nondif", 1:36)

# Generate 4 new DIF items for the reference group
difpar_ref <-
  shape_df(
    par.drm = list(a = c(0.8, 1.5, 0.8, 1.5), b = c(0.0, 0.0, -0.5, -0.5), g = 0.15),
    item.id = paste0("dif", 1:4), cats = 2, model = "3PLM"
  )

# Add uniform DIF by shifting the b-parameters for the focal group
difpar_foc <-
  difpar_ref %>%
  dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + rep(0.7, 4))

# Combine the DIF and non-DIF items for both reference and focal groups
# Therefor, the first 4 items exhibit uniform DIF
par_ref <- rbind(difpar_ref, par_nstd)
par_foc <- rbind(difpar_foc, par_nstd)

# Generate true ability values
set.seed(123)
theta_ref <- rnorm(500, 0.0, 1.0)
theta_foc <- rnorm(500, 0.0, 1.0)

# Simulate response data
resp_ref <- simdat(par_ref, theta = theta_ref, D = 1)
resp_foc <- simdat(par_foc, theta = theta_foc, D = 1)
data <- rbind(resp_ref, resp_foc)

###############################################
# (2) Estimate item and ability parameters
#     from the combined response data
###############################################

# Estimate item parameters
est_mod <- est_irt(data = data, D = 1, model = "3PLM")
est_par <- est_mod$par.est

# Estimate ability parameters using ML
score <- est_score(x = est_par, data = data, method = "ML")$est.theta

###############################################
# (3) Perform DIF analysis
###############################################

# Define group membership: 1 = focal group
group <- c(rep(0, 500), rep(1, 500))

# (a)-1 Compute RDIF statistics with provided ability scores
#       (no purification)
dif_nopuri_1 <- rdif(
  x = est_par, data = data, score = score,
  group = group, focal.name = 1, D = 1, alpha = 0.05
)
print(dif_nopuri_1)

# (a)-2 Compute RDIF statistics without providing ability scores
#       (no purification)
dif_nopuri_2 <- rdif(
  x = est_par, data = data, score = NULL,
  group = group, focal.name = 1, D = 1, alpha = 0.05,
  method = "ML"
)
print(dif_nopuri_2)

# (b)-1 Compute RDIF statistics with purification based on RDIF(R)
dif_puri_r <- rdif(
  x = est_par, data = data, score = score,
  group = group, focal.name = 1, D = 1, alpha = 0.05,
  purify = TRUE, purify.by = "rdifr"
)
print(dif_puri_r)

# (b)-2 Compute RDIF statistics with purification based on RDIF(S)
dif_puri_s <- rdif(
  x = est_par, data = data, score = score,
  group = group, focal.name = 1, D = 1, alpha = 0.05,
  purify = TRUE, purify.by = "rdifs"
)
print(dif_puri_s)

# (b)-3 Compute RDIF statistics with purification based on RDIF(RS)
dif_puri_rs <- rdif(
  x = est_par, data = data, score = score,
  group = group, focal.name = 1, D = 1, alpha = 0.05,
  purify = TRUE, purify.by = "rdifrs"
)
print(dif_puri_rs)




[Package irtQ version 1.0.0 Index]