class Cacofonix::Normaliser

A standalone class that can be used to normalise ONIX files into a standardised form. If you're accepting ONIX files from a wide range of suppliers, you're guarunteed to get all sorts of dialects.

This will create a new file that:

Usage:

Cacofonix::Normaliser.process("oldfile.xml", "newfile.xml")

Dependencies:

At this stage the class depends on several external apps, all commonly available on *nix systems: xsltproc, isutf8, iconv and sed

Public Class Methods

new(oldfile, newfile = nil) click to toggle source

NB: Newfile argument is deprecated.

# File lib/cacofonix/utils/normaliser.rb, line 41
def initialize(oldfile, newfile = nil)
  raise ArgumentError, "#{oldfile} does not exist" unless File.file?(oldfile)
  raise "xsltproc app not found" unless app_available?("xsltproc")
  raise "tr app not found"       unless app_available?("tr")

  @oldfile = oldfile
  @newfile = newfile
  @curfile = next_tempfile
  FileUtils.cp(@oldfile, @curfile)
  @head    = File.open(@oldfile, "r") { |f| f.read(1024) }
end
process(oldfile, newfile) click to toggle source

normalise oldfile and save it as newfile. oldfile will be left untouched

# File lib/cacofonix/utils/normaliser.rb, line 34
def process(oldfile, newfile)
  self.new(oldfile).normalise_to_path(newfile)
end

Public Instance Methods

app_available?(app) click to toggle source

check the specified app is available on the system

# File lib/cacofonix/utils/normaliser.rb, line 87
def app_available?(app)
  `which #{app}`.strip == "" ? false : true
end
next_tempfile() click to toggle source

generate a temp filename

# File lib/cacofonix/utils/normaliser.rb, line 93
def next_tempfile
  p = nil
  Tempfile.open("onix") do |tf|
    p = tf.path
    tf.close!
  end
  p
end
normalise_to_path(newfile) click to toggle source
# File lib/cacofonix/utils/normaliser.rb, line 58
def normalise_to_path(newfile)
  raise ArgumentError, "#{newfile} already exists" if File.file?(newfile)
  @curfile = normalise_to_tempfile
  FileUtils.cp(@curfile, newfile)
end
normalise_to_tempfile() click to toggle source

Processes oldfile and puts the normalised result in a tempfile, returning the path to that tempfile.

# File lib/cacofonix/utils/normaliser.rb, line 67
def normalise_to_tempfile
  src = @curfile

  # remove short tags
  if @head.include?("ONIXmessage")
    dest = next_tempfile
    to_reference_tags(src, dest)
    src = dest
  end

  # remove control chars
  dest = next_tempfile
  remove_control_chars(src, dest)
  dest
end
remove_control_chars(src, dest) click to toggle source

XML files shouldn't contain low ASCII control chars. Strip them.

# File lib/cacofonix/utils/normaliser.rb, line 117
def remove_control_chars(src, dest)
  inpath = File.expand_path(src)
  outpath = File.expand_path(dest)
  `cat #{inpath} | tr -d "\\000-\\010\\013\\014\\016-\\037" > #{outpath}`
end
run() click to toggle source

This is deprecated - use normalise_to_path with a path.

# File lib/cacofonix/utils/normaliser.rb, line 54
def run
  normalise_to_path(@newfile)
end
to_reference_tags(src, dest) click to toggle source

uses an XSLT stylesheet provided by edituer to convert a file from short tags to long tags.

more detail here:

http://www.editeur.org/files/ONIX%203/ONIX%20tagname%20converter%20v2.htm
# File lib/cacofonix/utils/normaliser.rb, line 108
def to_reference_tags(src, dest)
  inpath = File.expand_path(src)
  outpath = File.expand_path(dest)
  xsltpath = File.dirname(__FILE__) + "/../../../support/switch-onix-2.1-short-to-reference.xsl"
  `xsltproc -o #{outpath} #{xsltpath} #{inpath}`
end