module BCP47::Parser

Constants

ALPHANUM

Simplified check. Not implementing high level privateuse / grandfathered. Should replace with a proper check at some point.

EXTENSION
EXTLANG
LANGTAG

Ruby .match only keeps the first captured group, so expressions like variants/extensions we need to keep everything in one captured group, then break them down in multipe groups separately

LANGUAGE
LANGUAGE_TAG
PRIVATEUSE
REGION
SCRIPT
SINGLETON
VARIANT

Public Class Methods

parse(language_tag) click to toggle source
# File lib/bcp47_spec/parser.rb, line 109
def parse(language_tag)
  return unless match = language_tag.match(LANGUAGE_TAG)

  named_captures(match).tap do |captures|
    captures['variants']   = captures['variants'].to_s.empty? ? [] : captures['variants'][/-(.*)/, 1].split('-').sort
    captures['extensions'] = split_extensions(captures['extensions'])
    captures['private']    = captures['private'].to_s.empty? ? [] : captures['private'][/x-(.*)/, 1].split('-').sort
  end
end

Private Class Methods

named_captures(match) click to toggle source
# File lib/bcp47_spec/parser.rb, line 121
def named_captures(match)
  return match.named_captures if match.respond_to?(:named_captures)

  match.names.each_with_object({}) { |name, acc| acc[name] = match[name] }
end
split_extensions(extensions) click to toggle source
# File lib/bcp47_spec/parser.rb, line 127
def split_extensions(extensions)
  return [] if extensions.to_s.empty?

  # [["u-attr-co-phonebk"], ["t-und-cyrl"]]
  extensions = extensions.scan(/\b(?<ext>#{EXTENSION})\b/)
  # [["t", "und-cyrl"], ["u", "attr-co-phonebk"]]
  extensions.flatten.sort.map { |st| st.split('-', 2) }
end