spokenBNC2014_metadata {tlda}R Documentation

Speaker metadata for the Spoken BNC2014

Description

This dataset provides some metadata for the speakers in the Spoken BNC2014 (Love et al. 2017), including information on age, gender, and the total number of word tokens contributed to the corpus.

Usage

spokenBNC2014_metadata

Format

spokenBNC2014_metadata

A data frame with 668 rows and 6 columns:

speaker_id

Speaker ID (e.g. "S0001", "S0002")

age_group

Age group, based on the BNC1994 scheme ("0-14", "15-24", "25-34", "35-44", "45-59", "60+", "Unknown")

gender

Speaker gender ("Female" vs. "Male")

age

Age of speaker; if actual age is not available, imputed based on age_group and age_bin

n_tokens

Number of word tokens the speaker contributed to the corpus

age_bin

Age group, based on the BNC2014 scheme ("0-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70+")

Source

Love, Robbie, Claire Dembry, Andrew Hardie, Vaclav Brezina & Tony McEnery. 2017. The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics, 22(3), 319–344.


[Package tlda version 0.1.0 Index]