df_jaeger14 {pangoling} | R Documentation |
Self-Paced Reading Dataset on Chinese Relative Clauses
Description
This dataset contains data from a self-paced reading experiment on Chinese relative clause comprehension. It is structured to support analysis of reaction times, comprehension accuracy, and surprisal values across various experimental conditions in a 2x2 fully crossed factorial design:
Usage
data(df_jaeger14)
Format
A tibble with 8,624 rows and 15 variables:
- subject
Participant identifier, a character vector.
- item
Trial item number, an integer.
- cond
Experimental condition, a character vector indicating variations in sentence structure (e.g., "a", "b", "c", "d").
- word
Chinese word presented in each trial, a character vector.
- wordn
Position of the word within the sentence, an integer.
- rt
Reaction time in milliseconds for reading each word, an integer.
- region
Sentence region or phrase type (e.g., "hd1", "Det+CL"), a character vector.
- question
Comprehension question associated with the trial, a character vector.
- accuracy
Binary accuracy score for the comprehension question (1 = correct, 0 = incorrect).
- correct_answer
Expected correct answer for the comprehension question, a character vector ("Y" or "N").
- question_type
Type of comprehension question, a character vector.
- experiment
Name of the experiment, indicating self-paced reading, a character vector.
- list
Experimental list number, for counterbalancing item presentation, an integer.
- sentence
Full sentence used in the trial with words marked for analysis, a character vector.
- surprisal
Model-derived surprisal values for each word, a numeric vector.
Region codes in the dataset (column region
):
-
N: Main clause subject (in object-modifications only)
-
V: Main clause verb (in object-modifications only)
-
Det+CL: Determiner+classifier
-
Adv: Adverb
-
VN: RC-verb+RC-object (subject relatives) or RC-subject+RC-verb (object relatives)
Note: These two words were merged into one region after the experiment; they were presented as separate regions during the experiment.
-
FreqP: Frequency phrase/durational phrase
-
DE: Relativizer "de"
-
head: Relative clause head noun
-
hd1: First word after the head noun
-
hd2: Second word after the head noun
-
hd3: Third word after the head noun
-
hd4: Fourth word after the head noun (only in subject-modifications)
-
hd5: Fifth word after the head noun (only in subject-modifications)
Notes on reading times (column rt
):
The reading time of the relative clause region (e.g., "V-N" or "N-V") was computed by summing up the reading times of the relative clause verb and noun.
The verb and noun were presented as two separate regions during the experiment.
Details
-
Factor I: Modification type (subject modification; object modification)
-
Factor II: Relative clause type (subject relative; object relative)
Condition labels:
a) subject modification; subject relative
b) subject modification; object relative
c) object modification; subject relative
d) object modification; object relative
Source
Jäger, L., Chen, Z., Li, Q., Lin, C.-J. C., & Vasishth, S. (2015). The subject-relative advantage in Chinese: Evidence for expectation-based processing. Journal of Memory and Language, 79–80, 97-120. doi:10.1016/j.jml.2014.10.005
See Also
Other datasets:
df_sent
Examples
# Basic exploration
head(df_jaeger14)
# Summarize reaction times by region
library(tidytable)
df_jaeger14 |>
group_by(region) |>
summarize(mean_rt = mean(rt, na.rm = TRUE))