est_score {irtQ} | R Documentation |
Estimate examinees' ability (proficiency) parameters
Description
This function estimates examinees' latent ability parameters. Available scoring methods include maximum likelihood estimation (ML), maximum likelihood estimation with fences (MLF; Han, 2016), weighted likelihood estimation (WL; Warm, 1989), maximum a posteriori estimation (MAP; Hambleton et al., 1991), expected a posteriori estimation (EAP; Bock & Mislevy, 1982), EAP summed scoring (Thissen et al., 1995; Thissen & Orlando, 2001), and inverse test characteristic curve (TCC) scoring (e.g., Kolen & Brennan, 2004; Kolen & Tong, 2010; Stocking, 1996).
Usage
est_score(x, ...)
## Default S3 method:
est_score(
x,
data,
D = 1,
method = "ML",
range = c(-5, 5),
norm.prior = c(0, 1),
nquad = 41,
weights = NULL,
fence.a = 3,
fence.b = NULL,
tol = 1e-04,
max.iter = 100,
se = TRUE,
stval.opt = 1,
intpol = TRUE,
range.tcc = c(-7, 7),
missing = NA,
ncore = 1,
...
)
## S3 method for class 'est_irt'
est_score(
x,
method = "ML",
range = c(-5, 5),
norm.prior = c(0, 1),
nquad = 41,
weights = NULL,
fence.a = 3,
fence.b = NULL,
tol = 1e-04,
max.iter = 100,
se = TRUE,
stval.opt = 1,
intpol = TRUE,
range.tcc = c(-7, 7),
missing = NA,
ncore = 1,
...
)
Arguments
x |
A data frame containing item metadata (e.g., item parameters, number
of categories, IRT model types, etc.); or an object of class See |
... |
Additional arguments passed to |
data |
A matrix of examinees' item responses corresponding to the items
specified in the |
D |
A scaling constant used in IRT models to make the logistic function closely approximate the normal ogive function. A value of 1.7 is commonly used for this purpose. Default is 1. |
method |
A character string indicating the scoring method to use. Available options are:
Default is |
range |
A numeric vector of length two specifying the lower and upper
bounds of the ability scale. This is used for the following scoring
methods: |
norm.prior |
A numeric vector of length two specifying the mean and
standard deviation of the normal prior distribution. These values are used
to generate the Gaussian quadrature points and weights. Ignored if |
nquad |
An integer indicating the number of Gaussian quadrature points
to be generated from the normal prior distribution. Used only when |
weights |
A two-column matrix or data frame containing the quadrature
points (in the first column) and their corresponding weights (in the second
column) for the latent variable prior distribution. The weights and points
can be conveniently generated using the function If |
fence.a |
A numeric value specifying the item slope parameter (i.e., a-parameter) for the two imaginary items used in MLF. See Details below. Default is 3.0. |
fence.b |
A numeric vector of length two specifying the lower and upper
bounds of the item difficulty parameters (i.e., b-parameters) for the two
imaginary items in MLF. If |
tol |
A numeric value specifying the convergence tolerance for the ML, MLF, WL, MAP, and inverse TCC scoring methods. Newton-Raphson optimization is used for ML, MLF, WL, and MAP, while the bisection method is used for inverse TCC. Default is 1e-4. |
max.iter |
A positive integer specifying the maximum number of iterations allowed for the Newton-Raphson optimization. Default is 100. |
se |
Logical. If |
stval.opt |
A positive integer specifying the starting value option for the ML, MLF, WL, and MAP scoring methods. Available options are:
See Details below for more information. |
intpol |
Logical. If |
range.tcc |
A numeric vector of length two specifying the lower and
upper bounds of ability estimates when |
missing |
A value indicating missing responses in the data set. Default
is |
ncore |
An integer specifying the number of logical CPU cores to use for parallel processing. Default is 1. See Details below. |
Details
For the MAP scoring method, only a normal prior distribution is supported for the population distribution.
When there are missing responses in the data set, the missing value must be
explicitly specified using the missing
argument. Missing data are
properly handled when using the ML, MLF, WL, MAP, or EAP methods. However,
when using the "EAP.SUM" or "INV.TCC" methods, any missing responses are
automatically treated as incorrect (i.e., recoded as 0s).
In the maximum likelihood estimation with fences (MLF; Han, 2016), two
imaginary items based on the 2PL model are introduced. The first imaginary
item functions as the lower fence, and its difficulty parameter (b)
should be smaller than any of the difficulty parameters in the test form.
Similarly, the second imaginary item serves as the upper fence, and its b
parameter should be greater than any difficulty value in the test form.
Both imaginary items should also have very steep slopes (i.e., high
a-parameter values). See Han (2016) for more details. If fence.b = NULL
, the function will automatically assign the lower and upper fences
based on the values provided in the range
argument.
When the "INV.TCC" method is used with the 3PL model, ability estimates
cannot be obtained for observed sum scores that are less than the sum of
the items' guessing parameters. In such cases, linear interpolation can be
applied by setting intpol = TRUE
.
Let \theta_{min}
and \theta_{max}
denote the minimum and
maximum ability estimates, respectively, and let \theta_{X}
be the
ability estimate corresponding to the smallest observed sum score, X, that
is greater than or equal to the sum of the guessing parameters.When linear
interpolation is applied, the first value in the range.tcc
argument is
treated as \theta_{min}
. A line is then constructed between the
points (x = \theta_{min}, y = 0)
and (x = \theta_{X}, y = X)
.
The second value in range.tcc
is interpreted as \theta_{max}
, which
corresponds to the ability estimate for the maximum observed sum score.
For the "INV.TCC" method, standard errors of ability estimates are computed
using the approach proposed by Lim et al. (2020). The implementation of
inverse TCC scoring in this function is based on a modified version of the
SNSequate::irt.eq.tse()
function from the SNSequate package
(González, 2014).
For the ML, MLF, WL, and MAP scoring methods, different strategies can be
used to determine the starting value for ability estimation based on the
stval.opt
argument:
When
stval.opt = 1
(default), a brute-force search is performed by evaluating the log-likelihood at discrete theta values within the range specified byrange
, using 0.1 increments. The theta value yielding the highest log-likelihood is chosen as the starting value.When
stval.opt = 2
, the starting value is derived from the observed sum score using a logistic transformation. For example, if the maximum possible score (max.score
) is 30 and the examinee’s observed sum score (obs.score
) is 20, the starting value islog(obs.score / (max.score - obs.score))
.If all responses are incorrect (i.e.,
obs.score = 0
), the starting value islog(1 / max.score)
.If all responses are correct (
obs.score = max.score
), the starting value islog(max.score / 1)
.
When
stval.opt = 3
, the starting value is fixed at 0.
To accelerate ability estimation using the ML, MLF, WL, MAP, and EAP
methods, this function supports parallel processing across multiple logical
CPU cores. The number of cores can be specified via the ncore
argument
(default is 1).
Note that the standard errors of ability estimates are computed based on the Fisher expected information for the ML, MLF, WL, and MAP methods.
For the implementation of the WL method, the function references the
catR::Pi()
, catR::Ji()
, and catR::Ii()
functions from the catR
package (Magis & Barrada, 2017).
Value
When method
is one of "ML"
, "MLF"
, "WL"
, "MAP"
, or "EAP"
,
a two-column data frame is returned:
Column 1: Ability estimates
Column 2: Standard errors of the ability estimates
When method
is either "EAP.SUM"
or "INV.TCC"
, a list with two
components is returned:
Object 1: A three-column data frame including:
Column 1: Observed sum scores
Column 2: Ability estimates
Column 3: Standard errors of the ability estimates
Object 2: A score table showing possible raw sum scores and the corresponding ability and standard error estimates
Methods (by class)
-
est_score(default)
: Default method to estimate examinees' latent ability parameters using a data framex
containing the item metadata. -
est_score(est_irt)
: An object created by the functionest_irt()
.
Author(s)
Hwanggyu Lim hglim83@gmail.com
References
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Psychometrika, 35, 179-198.
González, J. (2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59, 1-30.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991).Fundamentals of item response theory. Newbury Park, CA: Sage.
Han, K. T. (2016). Maximum likelihood score estimation method with fences for short-length tests and computerized adaptive tests. Applied psychological measurement, 40(4), 289-301.
Howard, J. P. (2017). Computational methods for numerical analysis with R. New York: Chapman and Hall/CRC.
Kolen, M. J. & Brennan, R. L. (2004). Test Equating, Scaling, and Linking (2nd ed.). New York: Springer
Kolen, M. J. & Tong, Y. (2010). Psychometric properties of IRT proficiency estimates. Educational Measurement: Issues and Practice, 29(3), 8-14.
Lim, H., Davey, T., & Wells, C. S. (2020). A recursion-based analytical approach to evaluate the performance of MST. Journal of Educational Measurement, 58(2), 154-178.
Magis, D., & Barrada, J. R. (2017). Computerized adaptive testing with R: Recent updates of the package catR. Journal of Statistical Software, 76, 1-19.
Stocking, M. L. (1996). An alternative method for scoring adaptive tests. Journal of Educational and Behavioral Statistics, 21(4), 365-389.
Thissen, D. & Orlando, M. (2001). Item response theory for items scored in two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp.73-140). Mahwah, NJ: Lawrence Erlbaum.
Thissen, D., Pommerich, M., Billeaud, K., & Williams, V. S. (1995). Item Response Theory for Scores on Tests Including Polytomous Items with Ordered Responses. Applied Psychological Measurement, 19(1), 39-49.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450.
See Also
est_irt()
, simdat()
, shape_df()
,
gen.weight()
Examples
## Import the "-prm.txt" output file from flexMIRT
flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ")
# Read item parameters and convert them into item metadata
x <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df
# Generate examinee ability values
set.seed(12)
theta <- rnorm(10)
# Simulate item response data based on the item metadata and abilities
data <- simdat(x, theta, D = 1)
# Estimate abilities using maximum likelihood (ML)
est_score(x, data, D = 1, method = "ML", range = c(-4, 4), se = TRUE)
# Estimate abilities using weighted likelihood (WL)
est_score(x, data, D = 1, method = "WL", range = c(-4, 4), se = TRUE)
# Estimate abilities using MLF with default fences
# based on the `range` argument
est_score(x, data, D = 1, method = "MLF",
fence.a = 3.0, fence.b = NULL, se = TRUE)
# Estimate abilities using MLF with user-specified fences
est_score(x, data, D = 1, method = "MLF", fence.a = 3.0,
fence.b = c(-7, 7), se = TRUE)
# Estimate abilities using maximum a posteriori (MAP)
est_score(x, data, D = 1, method = "MAP", norm.prior = c(0, 1),
nquad = 30, se = TRUE)
# Estimate abilities using expected a posteriori (EAP)
est_score(x, data, D = 1, method = "EAP", norm.prior = c(0, 1),
nquad = 30, se = TRUE)
# Estimate abilities using EAP summed scoring
est_score(x, data, D = 1, method = "EAP.SUM", norm.prior = c(0, 1),
nquad = 30)
# Estimate abilities using inverse TCC scoring
est_score(x, data, D = 1, method = "INV.TCC", intpol = TRUE,
range.tcc = c(-7, 7))