class Phonemizer

Breaks a Hebrew string into its discrete phonemes

Public Class Methods

new(word) click to toggle source
# File lib/phonemizer.rb, line 56
def initialize word
  @hebword = word
end

Public Instance Methods

phonemes() click to toggle source

Breaks the word down into its discrete phonemes “ם’’ ,“וּ“ ,“כּ“ ,“ע“] = “עַכּוּם]

No arguments; returns an array

This function depends heavily on the workings of Hebrew grammer, so it gets a bit complicated. If you have a more elegant solution, I'd gladly take it. This thing was a hornet's nest full of bugs, so watch that test suite when editing!

# File lib/phonemizer.rb, line 73
def phonemes
      @completed = []

      # For each raw character :
      @hebword.chars.each_with_index do |char,i|

        # Skip whitespace
        if char == ENGLISH_SPACE || char == HEBREW_SPACE
          next

        # If it's a final letter, normalize it to its standard form (מ –> ם)
        elsif char =~ FINAL_LETTER
          @completed << normalize_final_letter(char)

        # If it's a CHATAF, normalize it to it's standard form
        elsif CHATAF.include? char
          @completed << deCHATAFize(char)

        # If it's a SHIN_DOT, find the previous SIN and replace it with SHIN_WITH_DOT
        elsif char == SHIN_DOT
          @completed[@completed.rindex(SIN)] = SHIN_WITH_DOT

        # If it's a DAGESH:
        # 1. Find the previous letter
        # 2. Check if it's on the list of DAGESH-compatible letters
        # 3. If it is, add it
        # 4. If it's not, implicitly fall through to the `else` case
        elsif char == DAGESH
          previous_letter = previous_letter_index(i, @completed)
          if previous_letter.nil? then raise "Orphaned DAGESH: DAGESH at position #{i} is not preceded by a letter.(Word: \"#{@hebword}\")"; end
          if DAGESH_WHITELIST =~ @completed[previous_letter]
            @completed[previous_letter] += DAGESH
          end

        # Skip the VAV of a CHOLOM MALEI, otherwise add it
        elsif char == VAV
          @hebword[i + 1] == CHOLOM ? next : @completed << VAV

        # Skip the YUD of a CHIRIK MALEI and TZEIREI MALEI, otherwise add them
        elsif char == YUD
          (@completed.last == CHIRIK ||
           @completed.last == TZEIREI) ?
              next : @completed << YUD

        # Append a PATACH to a final CHES  ( חַ )
        elsif char == PATACH &&          # It's a PATACH
            @completed.last == CHES &&   # Proceeded by a CHES
            (i == @hebword.length - 1)   # At the end of the word
         @completed[@completed.length - 1] += PATACH

        # Otherwise, pass the letter or nekuda unchanged
        else
          @completed << char
        end

      end # end loop
      @completed

end
raw() click to toggle source

Returns the unedited Hebrew string

# File lib/phonemizer.rb, line 61
def raw
  @hebword
end

Private Instance Methods

deCHATAFize(chataf) click to toggle source

Normalize CHATAF nekudos to standard forms Raises a `RuntimeError` if the character is not one of ['ֲ','ֳ','ֱ']

# File lib/phonemizer.rb, line 151
def deCHATAFize chataf
  case chataf
    when "ֲ" then return "ַ"
    when "ֳ" then return "ָ"
    when "ֱ" then return "ֶ"
  end
  raise "#{chataf} is not a CHATAF\n\tSuggested test snippet: ['ֲ','ֳ','ֱ'].include?(#{chataf})"
end
normalize_final_letter(char) click to toggle source

Normalize final letters to standard forms

# File lib/phonemizer.rb, line 137
def normalize_final_letter char
  case char
    when "ם" then return "מ"
    when "ן" then return "נ"
    when "ץ" then return "צ"
    when "ף" then return "פ"
    when "ך" then return "כ"
    else
      raise "#{char} is not a final letter\nSuggested test snippet: #{FINAL_LETTER} =~ #{char}\n"
  end
end
previous_letter_index(current_loc, array) click to toggle source

Return the index of the first previous character that is a letter

* If the index is a letter            -> Ignore it and find the previous one #BugOrFeature?
* If a previous character is a letter -> return its index
* If no characters are letters        -> nil
# File lib/phonemizer.rb, line 164
def previous_letter_index current_loc, array
  current_loc.downto(0) do |i|
    return i if array[i] =~ LETTER
  end
  nil
end