module RemoteTable::Plaintext

Helper methods that act on plaintext files before they are parsed

Constants

EOL_TO_UNIX
UTF8_BOM

UTF-8 byte order mark

Public Class Methods

soft_hyphen(encoding) click to toggle source

@private Code for the soft hyphen, often inserted by MS Office (html: ­)

# File lib/remote_table/plaintext.rb, line 10
def soft_hyphen(encoding)
  case encoding
  when /775/, /85[02578]/
    '\xF0'
  when /utf-?8/i
    '\xc2\xad'
  else # iso-8859-1, latin1, windows-1252, etc...
    '\xad'
  end
end

Public Instance Methods

convert_eol_to_unix!() click to toggle source

No matter what the EOL are SUPPOSED to be, run it through Perl with a regex that will convert all EOLS to n

@example

perl -pe 's/\r\n|\n|\r/\n/g'
# File lib/remote_table/plaintext.rb, line 50
def convert_eol_to_unix!
  local_copy.in_place :perl, EOL_TO_UNIX
end
crop_rows!() click to toggle source

If the user has specified :crop, use a combination of tail and head

@example :crop => (184..263)

tail +184 | head 80
# File lib/remote_table/plaintext.rb, line 68
def crop_rows!
  if crop
    local_copy.in_place :tail, "+#{crop.first}"
    local_copy.in_place :head, (crop.last - crop.first + 1)
  end
end
cut_columns!() click to toggle source

If the user has specified :cut, use cut

@example :cut => ‘13-’

cut -c 13-
# File lib/remote_table/plaintext.rb, line 79
def cut_columns!
  if cut
    local_copy.in_place :cut, cut
  end
end
delete_harmful!() click to toggle source

Remove bytes that are both useless and harmful in the vast majority of cases.

# File lib/remote_table/plaintext.rb, line 27
def delete_harmful!
  harmful = [ Plaintext.soft_hyphen(encoding), UTF8_BOM ]
  local_copy.in_place :perl, "s/#{harmful.join('//g; s/')}//g"
end
skip_rows!() click to toggle source

If the user has specified :skip, use tail

@example :skip => 6

tail +7
# File lib/remote_table/plaintext.rb, line 58
def skip_rows!
  if skip > 0
    local_copy.in_place :tail, "+#{skip + 1}"
  end
end
transliterate_whole_file_to_utf8!() click to toggle source

No matter what the file encoding is SUPPOSED to be, run it through the system iconv binary to make sure it’s UTF-8

@example

iconv -c -t UTF-8//TRANSLIT -f WINDOWS-1252
# File lib/remote_table/plaintext.rb, line 36
def transliterate_whole_file_to_utf8!
  if ::UnixUtils.available?('iconv')
    local_copy.in_place :iconv, RemoteTable::EXTERNAL_ENCODING_ICONV, encoding
  else
    ::Kernel.warn %{[remote_table] iconv not available in your $PATH, not performing transliteration}
  end
  # now that we've force-transliterated to UTF-8, act as though this is what the user had specified
  @encoding = RemoteTable::EXTERNAL_ENCODING
end