ice_metadata {tlda}R Documentation

Text metadata for ICE corpora

Description

This dataset provides metadata for the text files in the ICE family of corpora. It maps standardized file names to various textual categories such as mode of production, macro genre and genre.

Usage

ice_metadata

Format

ice_metadata

A data frame with 500 rows and 6 columns:

text_file

Standardized name of the text file (e.g. "s1a-001", "w1b-008", "w2d-018")

mode

Mode of production ("spoken" vs. "written")

text_category

4 higher-level text categories ("dialogues", "monologues", "non-printed", "printed")

macro_genre

12 macro genres (e.g. "private_dialogues", "student_writing", "reportage")

genre

32 genres (e.g. "phonecalls", "unscripted_speeches", "novels_short_stories")

genre_short

Short label for the genre (see Schützler 2023: 228)

Source

https://www.ice-corpora.uzh.ch/en/design.html

Greenbaum, Sidney. 1996. Introducing ICE. In Sidney Greenbaum (ed.), Comparing English worldwide: The International Corpus of English, 3–12. Oxford: Clarendon Press.

Schützler, Ole. 2023. Concessive constructions in varieties of English. Berlin: Language Science Press. doi:10.5281/zenodo.8375010


[Package tlda version 0.1.0 Index]