spokenBNC1994_metadata {tlda}R Documentation

Speaker metadata for the Spoken BNC1994

Description

This dataset provides some metadata for speakers in the demographically sampled part of the Spoken BNC1994 (Crowdy 1995), including information on age, gender, and the total number of word tokens contributed to the corpus.

Usage

spokenBNC1994_metadata

Format

spokenBNC1994_metadata

A data frame with 1,017 rows and 7 columns:

speaker_id

Speaker ID (e.g. "PS002", "PS003")

age_group

Age group, based on the BNC1994 scheme ("0-14", "15-24", "25-34", "35-44", "45-59", "60+", "Unknown")

gender

Speaker gender ("Female" vs. "Male")

age

Age of speaker; if actual age is not available, imputed based on age_group and age_bin

n_tokens

Number of word tokens the speaker contributed to the corpus

age_bin

Age group, based on the BNC2014 scheme ("0-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70+")

Source

Crowdy, Steve. 1995. The BNC spoken corpus. In Geoffrey Leech, Greg Myers & Jenny Thomas (eds.), Spoken English on Computer: Transcription, Mark-Up and Annotation, 224–234. Harlow: Longman.


[Package tlda version 0.1.0 Index]