metaphonebr {metaphonebr} | R Documentation |
Generates Phonetic Code (adapted Metaphone-BR) for Names in Portuguese
Description
Applies a series of phonetic transformations to a person names vector to generate code that represents its approximate pronunciation in Brazilian Portuguese. The objective is to group similar sounding names, even though written in different forms.
Usage
metaphonebr(fullnames, verbose = FALSE)
Arguments
fullnames |
A character vector for names to be processed. |
verbose |
Logical, if |
Details
The treatment process involves:
Preprocessing: Removal of accents, numbers and capitalize.
Removal of silent letters (initial H).
Simplification of common digraphs (LH, NH, CH, SC, QU, etc.).
Simplification of similar sounding consonants (C/K/S, G/J, Z/S, etc.).
Simplification of ending nasal sounds.
Removal of duplicated vowels.
Removal/trim of spaces and duplicated letters.
This is an adpation that does not follow strictly any published Metaphone algorithm, but was inspired by them considering brazilian portuguese context.
Value
A character vector with corresponding phonetic representation for each entry.
Examples
example_names <- c("Jo\u00e3o Silva", "Joao da Silva", "Maria", "Marya",
"Helena", "Elena", "Philippe", "Filipe", "Xavier", "Chavier")
phonetic_codes <- metaphonebr(example_names)
print(data.frame(Original = example_names, metaphonebr = phonetic_codes))
# With progress messages
phonetic_codes_verbose <- metaphonebr("Exemplo Ășnico", verbose = TRUE)