class Cacofonix::Reader

This is the primary class for reading data from an ONIX file.

Each ONIX file should contain a single header, and 1 or more products:

reader = Cacofonix::Reader.new("somefile.xml")

puts reader.header.inspect

reader.each do |product|
  puts product.inspect
end

The header will be returned as an Cacofonix::Header object, and the product will be an Cacofonix::Product.

The Cacofonix::Product class can be a bit of a hassle to work with, as data can be nested in it fairly deeply. To wrap all the products returned by the reader in a shim that provides simple accessor access to common attributes, pass the shim class as a second argument.

reader = Cacofonix::Reader.new("somefile.xml", Cacofonix::APAProduct)

puts reader.header.inspect

reader.each do |product|
  puts product.inspect
end

APAProduct stands for Australian Publishers Association and provides simple access to the ONIX attributes that are commonly used in the Australian market.

As well as accessing the file header, there are handful of other read only attributes that might be useful

reader = Cacofonix::Reader.new("somefile.xml", Cacofonix::APAProduct)

puts reader.xml_lang
puts reader.xml_version

Note that ONIX has 1500 valid named entities (such as –) that can cause the parser to throw an exception because it doesn't recognise them. To work around this, the parser loads the ONIX DTD. It generally has to do this over the internet, which slows parsing considerably. To skip DTD loading (running the risk of parser exceptions), you can pass an option to the constructor:

reader = Cacofonix::Reader.new("somefile.xml", Cacofonix::Product, :dtd => false)

For more information, see is.gd/p7fHQq

Attributes

header[R]
release[R]
xml_lang[R]
xml_version[R]

Public Class Methods

new(input, product_klass = ::Cacofonix::Product, options = {}) click to toggle source

Options:

:dtd - if false, then DTD is not loaded before parsing
:interpret - a module (or an array of modules) that should extend
             each Product
# File lib/cacofonix/core/reader.rb, line 71
def initialize(input, product_klass = ::Cacofonix::Product, options = {})
  @input = input
  @product_klass = product_klass
  @options = options || {}

  create_parser

  @release = find_release
  @header = find_header

  @xml_lang    ||= @reader.lang
  @xml_version ||= @reader.xml_version.to_f
end

Public Instance Methods

close() click to toggle source
# File lib/cacofonix/core/reader.rb, line 111
def close
  puts "Reader#close is deprecated."
end
each() { |product| ... } click to toggle source

Iterate over all the products in an ONIX file

# File lib/cacofonix/core/reader.rb, line 87
def each(&block)
  @reader.each do |node|
    if @reader.node_type == 1 && @reader.name == "Product"
      str = @reader.outer_xml
      product = str.nil? ? @product_klass.new : @product_klass.from_xml(str)
      product.interpret @options[:interpret]
      yield product
    end
  end
end
products() click to toggle source

Assemble all the products in the ONIX file into an array. Obviously this will chew through memory with very large ONIX files, so use with care.

# File lib/cacofonix/core/reader.rb, line 101
def products
  @products ||= [].tap { |prods| each { |prod| prods << prod } }
end
rewind() click to toggle source

Rewind the reader so that you can call 'each' again.

# File lib/cacofonix/core/reader.rb, line 107
def rewind
  create_parser
end

Private Instance Methods

create_parser() click to toggle source
# File lib/cacofonix/core/reader.rb, line 117
def create_parser
  Cacofonix::DTDs.apply_libxml_env!

  parser_config = lambda { |cfg|
    cfg.noent
    cfg.dtdload unless @options[:dtd] == false
  }
  if @input.kind_of?(String)
    @file   = File.open(@input, "r")
    @reader = Nokogiri::XML::Reader(@file, &parser_config)
  elsif @input.respond_to?(:read)
    @reader = Nokogiri::XML::Reader(@input, &parser_config)
  else
    raise ArgumentError, "Unable to read from file or IO stream"
  end
end
find_header() click to toggle source
# File lib/cacofonix/core/reader.rb, line 151
def find_header
  100.times do
    @reader.read
    if @reader.node_type == 1 &&  @reader.name == "Header"
      str = @reader.outer_xml
      if str.nil?
        return Cacofonix::Header.new
      else
        return Cacofonix::Header.from_xml(str)
      end
    end
  end
  return nil
end
find_release() click to toggle source
# File lib/cacofonix/core/reader.rb, line 134
def find_release
  2.times do
    @reader.read
    if @reader.node_type == 1 && @reader.name == "ONIXMessage"
      value = @reader.attributes["release"]
      if value
        return BigDecimal(value)
      else
        return nil
      end
    elsif @reader.node_type == 14
      return nil
    end
  end
  return nil
end