metaphonebr {metaphonebr}R Documentation

Generates Phonetic Code (adapted Metaphone-BR) for Names in Portuguese

Description

Applies a series of phonetic transformations to a person names vector to generate code that represents its approximate pronunciation in Brazilian Portuguese. The objective is to group similar sounding names, even though written in different forms.

Usage

metaphonebr(fullnames, verbose = FALSE)

Arguments

fullnames

A character vector for names to be processed.

verbose

Logical, if TRUE, print progress messages at each step. Default FALSE.

Details

The treatment process involves:

  1. Preprocessing: Removal of accents, numbers and capitalize.

  2. Removal of silent letters (initial H).

  3. Simplification of common digraphs (LH, NH, CH, SC, QU, etc.).

  4. Simplification of similar sounding consonants (C/K/S, G/J, Z/S, etc.).

  5. Simplification of ending nasal sounds.

  6. Removal of duplicated vowels.

  7. Removal/trim of spaces and duplicated letters.

This is an adpation that does not follow strictly any published Metaphone algorithm, but was inspired by them considering brazilian portuguese context.

Value

A character vector with corresponding phonetic representation for each entry.

Examples

example_names <- c("Jo\u00e3o Silva", "Joao da Silva", "Maria", "Marya",
                   "Helena", "Elena", "Philippe", "Filipe", "Xavier", "Chavier")
phonetic_codes <- metaphonebr(example_names)
print(data.frame(Original = example_names, metaphonebr = phonetic_codes))

# With progress messages
phonetic_codes_verbose <- metaphonebr("Exemplo Ășnico", verbose = TRUE)

[Package metaphonebr version 0.0.4 Index]