ice_metadata {tlda} | R Documentation |
Text metadata for ICE corpora
Description
This dataset provides metadata for the text files in the ICE family of corpora. It maps standardized file names to various textual categories such as mode of production, macro genre and genre.
Usage
ice_metadata
Format
ice_metadata
A data frame with 500 rows and 6 columns:
- text_file
Standardized name of the text file (e.g. "s1a-001", "w1b-008", "w2d-018")
- mode
Mode of production ("spoken" vs. "written")
- text_category
4 higher-level text categories ("dialogues", "monologues", "non-printed", "printed")
- macro_genre
12 macro genres (e.g. "private_dialogues", "student_writing", "reportage")
- genre
32 genres (e.g. "phonecalls", "unscripted_speeches", "novels_short_stories")
- genre_short
Short label for the genre (see Schützler 2023: 228)
Source
https://www.ice-corpora.uzh.ch/en/design.html
Greenbaum, Sidney. 1996. Introducing ICE. In Sidney Greenbaum (ed.), Comparing English worldwide: The International Corpus of English, 3–12. Oxford: Clarendon Press.
Schützler, Ole. 2023. Concessive constructions in varieties of English. Berlin: Language Science Press. doi:10.5281/zenodo.8375010