module UTF8Encoding

Provides encoding support to be used for file / rawtext handling.

Allows any object (if supported) to have all related strings encoded in place to UTF-8.

Constants

AUTO_ENCODINGS

Our known source encodings, in order of preference:

BINARY
REPLACEMENT_SCHEME

How should unmappable characters be escaped, when forcing encoding?

UTF8

Public Instance Methods

coerce_utf8(string, source_encoding = nil) click to toggle source

Returns a UTF-8 version of ‘string`, escaping any unmappable characters.

# File lib/ndr_support/utf8_encoding.rb, line 47
def coerce_utf8(string, source_encoding = nil)
  coerce_utf8!(string.dup, source_encoding)
end
coerce_utf8!(string, source_encoding = nil) click to toggle source

Coerces ‘string` to UTF-8, in place, escaping any unmappable characters.

# File lib/ndr_support/utf8_encoding.rb, line 52
def coerce_utf8!(string, source_encoding = nil)
  # Try normally first...
  ensure_utf8!(string, source_encoding)
rescue UTF8CoercionError
  # ...before going back-to-basics, and replacing things that don't map:
  string.encode!(UTF8, BINARY, :fallback => REPLACEMENT_SCHEME)
end
ensure_utf8(string, source_encoding = nil) click to toggle source

Returns a new string with valid UTF-8 encoding, or raises an exception if encoding fails.

# File lib/ndr_support/utf8_encoding.rb, line 24
def ensure_utf8(string, source_encoding = nil)
  ensure_utf8!(string.dup, source_encoding)
end
ensure_utf8!(string, source_encoding = nil) click to toggle source

Attempts to encode ‘string` to UTF-8, in place. Returns `string`, or raises an exception.

# File lib/ndr_support/utf8_encoding.rb, line 30
def ensure_utf8!(string, source_encoding = nil)
  # A list of encodings we should try from:
  candidates = source_encoding ? Array.wrap(source_encoding) : AUTO_ENCODINGS

  # Attempt to coerce the string to UTF-8, from one of the source
  # candidates (in order of preference):
  apply_candidates!(string, candidates)

  unless string.valid_encoding?
    # None of our candidate source encodings worked, so fail:
    fail(UTF8CoercionError, "Attempted to use: #{candidates}")
  end

  string
end

Private Instance Methods

apply_candidates!(string, candidates) click to toggle source
# File lib/ndr_support/utf8_encoding.rb, line 62
def apply_candidates!(string, candidates)
  candidates.detect do |encoding|
    begin
      # Attempt to encode as UTF-8 from source `encoding`:
      string.encode!(UTF8, encoding)
      # If that worked, we're done; otherwise, move on.
      string.valid_encoding?
    rescue EncodingError
      # If that failed really badly, move on:
      false
    end
  end
end