module UTF8Encoding
Provides encoding support to be used for file / rawtext handling.
Allows any object (if supported) to have all related strings encoded in place to UTF-8.
Constants
- AUTO_ENCODINGS
Our known source encodings, in order of preference:
- BINARY
- REPLACEMENT_SCHEME
How should unmappable characters be escaped, when forcing encoding?
- UTF8
Public Instance Methods
coerce_utf8(string, source_encoding = nil)
click to toggle source
Returns a UTF-8 version of ‘string`, escaping any unmappable characters.
# File lib/ndr_support/utf8_encoding.rb, line 47 def coerce_utf8(string, source_encoding = nil) coerce_utf8!(string.dup, source_encoding) end
coerce_utf8!(string, source_encoding = nil)
click to toggle source
Coerces ‘string` to UTF-8, in place, escaping any unmappable characters.
# File lib/ndr_support/utf8_encoding.rb, line 52 def coerce_utf8!(string, source_encoding = nil) # Try normally first... ensure_utf8!(string, source_encoding) rescue UTF8CoercionError # ...before going back-to-basics, and replacing things that don't map: string.encode!(UTF8, BINARY, :fallback => REPLACEMENT_SCHEME) end
ensure_utf8(string, source_encoding = nil)
click to toggle source
Returns a new string with valid UTF-8 encoding, or raises an exception if encoding fails.
# File lib/ndr_support/utf8_encoding.rb, line 24 def ensure_utf8(string, source_encoding = nil) ensure_utf8!(string.dup, source_encoding) end
ensure_utf8!(string, source_encoding = nil)
click to toggle source
Attempts to encode ‘string` to UTF-8, in place. Returns `string`, or raises an exception.
# File lib/ndr_support/utf8_encoding.rb, line 30 def ensure_utf8!(string, source_encoding = nil) # A list of encodings we should try from: candidates = source_encoding ? Array.wrap(source_encoding) : AUTO_ENCODINGS # Attempt to coerce the string to UTF-8, from one of the source # candidates (in order of preference): apply_candidates!(string, candidates) unless string.valid_encoding? # None of our candidate source encodings worked, so fail: fail(UTF8CoercionError, "Attempted to use: #{candidates}") end string end
Private Instance Methods
apply_candidates!(string, candidates)
click to toggle source
# File lib/ndr_support/utf8_encoding.rb, line 62 def apply_candidates!(string, candidates) candidates.detect do |encoding| begin # Attempt to encode as UTF-8 from source `encoding`: string.encode!(UTF8, encoding) # If that worked, we're done; otherwise, move on. string.valid_encoding? rescue EncodingError # If that failed really badly, move on: false end end end