class Scruber::Core::Extensions::MongoOutput

Extension for writing results to mongo collections. It registers methods for writing documents:

mongo_out({..}) # writing document to {prefix}_{scraper_name}_records
mongo_out_product({..}) # writing document to {prefix}_{scraper_name}_product

Searching methods:

mongo_find({..}) # searching document in {prefix}_{scraper_name}_records
mongo_find_product({..}) # searching document in {prefix}_{scraper_name}_product

Accessing to mongo collection:

mongo_collection({..}) # Direct access to {prefix}_{scraper_name}_records
mongo_product_collection({..}) # Direct access to {prefix}_{scraper_name}_product

@example Writing products data and companies

Scruber.run :simple do
  get_product 'http://example.com/product'
  get_company 'http://example.com/product'

  parse_product :html do |page,doc|
    id = mongo_out_product {title: doc.at('h1').text, price: doc.at('.price').text }
    record = mongo_find_product id
    record[:description] = doc.at('.desc').text
    mongo_out_product record
    log "Count: #{mongo_product_collection.count}"
  end

  parse_company :html do |page,doc|
    mongo_out_company {name: doc.at('h1').text, phone: doc.at('.phone').text }
  end
end

@author Ivan Goncharov

Attributes

default_suffix_name[W]

Default mongo collection suffix name

Public Class Methods

default_suffix_name() click to toggle source

Default mongo collection suffix name

@return [String] Default mongo collection suffix name

# File lib/scruber/core/extensions/mongo_output.rb, line 111
def default_suffix_name
  @default_suffix_name ||= 'records'
end
mongo_collection(scraper_name, suffix) click to toggle source

Access to mongo collection

@param scraper_name [String] name of scraper to build collection name @param suffix [String] suffix to build collection name

@return [Mongo::Collection] instance of Mongo::Collection

# File lib/scruber/core/extensions/mongo_output.rb, line 160
def mongo_collection(scraper_name, suffix)
  Scruber::Mongo.client[out_collection_name(scraper_name, suffix)]
end
mongo_find(scraper_name, suffix, id) click to toggle source

Searching document in mongo

@param scraper_name [String] name of scraper to build collection name @param suffix [String] suffix to build collection name @param id [Object] id of document

@return [Hash] document

# File lib/scruber/core/extensions/mongo_output.rb, line 145
def mongo_find(scraper_name, suffix, id)
  if id.is_a?(Hash)
    Scruber::Mongo.client[out_collection_name(scraper_name, suffix)].find(id)
  else
    Scruber::Mongo.client[out_collection_name(scraper_name, suffix)].find({_id: id}).first
  end
end
mongo_out(scraper_name, suffix, fields, options={}) click to toggle source

Writing results to mongo collection

@param scraper_name [String] name of scraper to build collection name @param suffix [String] suffix to build collection name @param fields [Hash] Document to output @param options [Hash] Options for updating record (when _id not set), see docs.mongodb.com/manual/reference/method/db.collection.findOneAndUpdate/

@return [type] [description]

# File lib/scruber/core/extensions/mongo_output.rb, line 124
def mongo_out(scraper_name, suffix, fields, options={})
  fields = fields.with_indifferent_access
  if fields[:_id].blank?
    Scruber::Mongo.client[out_collection_name(scraper_name, suffix)].insert_one(fields).inserted_id
  else
    Scruber::Mongo.client[out_collection_name(scraper_name, suffix)].find_one_and_update(
      {"_id" => fields[:_id] },
      {'$set' => fields },
      {return_document: :after, upsert: true}.merge(options)
    )[:_id]
  end
end
out_collection_name(scraper_name, suffix) click to toggle source

Collection name builder

@param scraper_name [String] name of scraper to build collection name @param suffix [String] suffix to build collection name

@return [String] name of collection for given scraper_name and suffix

# File lib/scruber/core/extensions/mongo_output.rb, line 171
def out_collection_name(scraper_name, suffix)
  [Scruber::Mongo.configuration.options['collections_prefix'], scraper_name, suffix].select(&:present?).map(&:to_s).join('_')
end