rapidsplit {rapidsplithalf} | R Documentation |
rapidsplit
Description
A very fast algorithm for computing stratified permutation-based split-half reliability.
Usage
rapidsplit(
data,
subjvar,
diffvars = NULL,
stratvars = NULL,
subscorevar = NULL,
aggvar,
splits = 6000,
aggfunc = c("means", "medians"),
errorhandling = list(type = c("none", "fixedpenalty"), errorvar = NULL, fixedpenalty =
600, blockvar = NULL),
standardize = FALSE,
include.scores = TRUE,
verbose = TRUE,
check = TRUE
)
## S3 method for class 'rapidsplit'
print(x, ...)
## S3 method for class 'rapidsplit'
plot(
x,
type = c("average", "minimum", "maximum", "random", "all"),
show.labels = TRUE,
...
)
rapidsplit.chunks(
data,
subjvar,
diffvars = NULL,
stratvars = NULL,
subscorevar = NULL,
aggvar,
splits = 6000,
aggfunc = c("means", "medians"),
errorhandling = list(type = c("none", "fixedpenalty"), errorvar = NULL, fixedpenalty =
600, blockvar = NULL),
standardize = FALSE,
include.scores = TRUE,
verbose = TRUE,
check = TRUE,
chunks = 4,
cluster = NULL
)
Arguments
data |
Dataset, a |
subjvar |
Subject ID variable name, a |
diffvars |
Names of variables that determine which conditions
need to be subtracted from each other, |
stratvars |
Additional variables that the splits should
be stratified by; a |
subscorevar |
A |
aggvar |
Name of variable whose values to aggregate, a |
splits |
Number of split-halves to average, an |
aggfunc |
The function by which to aggregate the variable
defined in |
errorhandling |
A list with 4 named items, to be used to replace error trials
with the block mean of correct responses plus a fixed penalty, as in the IAT D-score.
The 4 items are |
standardize |
Whether to divide by scores by the subject's SD; a |
include.scores |
Include all individual split-half scores? |
verbose |
Display progress bars? Defaults to |
check |
Check input for possible problems? |
x |
|
... |
Ignored. |
type |
Character argument indicating what should be plotted.
By default, this plots the random split whose correlation is closest to the average.
However, this can also plot the random split with
the |
show.labels |
Should participant IDs be shown above their points in the scatterplot?
Defaults to |
chunks |
Number of chunks to divide the splits in, for more memory-efficient computation, and to divide over multiple cores if requested. |
cluster |
Chunks will be run on separate cores if a cluster is provided,
or an |
Details
The order of operations (with optional steps between brackets) is:
Splitting
(Replacing error trials within block within split)
Computing aggregates per condition (per subscore) per person
Subtracting conditions from each other
(Dividing the resulting (sub)score by the SD of the data used to compute that (sub)score)
(Averaging subscores together into a single score per person)
Computing the covariances of scores from one half with scores from the other half for every split
Computing the variances of scores within each half for every split
Computing the average split-half correlation with the average variances and covariance across all splits, using
corStatsByColumns()
Applying the Spearman-Brown formula to the absolute correlation using
spearmanBrown()
, and restoring the original sign after
cormean()
was used to aggregate correlations in previous versions
of this package & in the associated manuscript, but the method based on
(co)variance averaging was found to be more accurate. This was suggested by
prof. John Christie of Dalhousie University.
Value
A list
containing
-
r
: the averaged reliability. -
ci
: the 95% confidence intervals. -
allcors
: a vector with the reliability of each iteration. -
nobs
: the number of participants. -
scores
: the individual participants scores in each split-half, contained in a list with two matrices (Only included if requested withinclude.scores
).
Note
This function can use a lot of memory in one go. If you are computing the reliability of a large dataset or you have little RAM, it may pay off to use the sequential version of this function instead:
rapidsplit.chunks()
It is currently unclear it is better to pre-process your data before or after splitting it. If you are computing the IAT D-score, you can therefore use
errorhandling
andstandardize
to perform these two actions after splitting, or you can process your data before splitting and forgo these two options.
Author(s)
Sercan Kahveci
References
Kahveci, S., Bathke, A.C. & Blechert, J. (2024) Reaction-time task reliability is more accurately computed with permutation-based split-half correlations than with Cronbach’s alpha. Psychonomic Bulletin and Review. doi:10.3758/s13423-024-02597-y
Examples
data(foodAAT)
# Reliability of the double difference score:
# [RT(push food)-RT(pull food)] - [RT(push object)-RT(pull object)]
frel<-rapidsplit(data=foodAAT,
subjvar="subjectid",
diffvars=c("is_pull","is_target"),
stratvars="stimid",
aggvar="RT",
splits=100)
print(frel)
plot(frel,type="all")
# Compute a single random split-half reliability of the error rate
rapidsplit(data=foodAAT,
subjvar="subjectid",
aggvar="error",
splits=1,
aggfunc="means")
# Compute the reliability of an IAT D-score
data(raceIAT)
rapidsplit(data=raceIAT,
subjvar="session_id",
diffvars="congruent",
subscorevar="blocktype",
aggvar="latency",
errorhandling=list(type="fixedpenalty",errorvar="error",
fixedpenalty=600,blockvar="block_number"),
splits=100,
standardize=TRUE)
# Unstratified reliability of the median RT
rapidsplit.chunks(data=foodAAT,
subjvar="subjectid",
aggvar="RT",
splits=100,
aggfunc="medians",
chunks=8)
# Compute the reliability of Tukey's trimean of the RT
# on 2 CPU cores
trimean<-function(x){
sum(quantile(x,c(.25,.5,.75))*c(1,2,1))/4
}
rapidsplit.chunks(data=foodAAT,
subjvar="subjectid",
aggvar="RT",
splits=200,
aggfunc=trimean,
cluster=2)