class Scruber::Core::Extensions::MongoOutput
Extension for writing results to mongo collections. It registers methods for writing documents:
mongo_out({..}) # writing document to {prefix}_{scraper_name}_records mongo_out_product({..}) # writing document to {prefix}_{scraper_name}_product
Searching methods:
mongo_find({..}) # searching document in {prefix}_{scraper_name}_records mongo_find_product({..}) # searching document in {prefix}_{scraper_name}_product
Accessing to mongo collection:
mongo_collection({..}) # Direct access to {prefix}_{scraper_name}_records mongo_product_collection({..}) # Direct access to {prefix}_{scraper_name}_product
@example Writing products data and companies
Scruber.run :simple do get_product 'http://example.com/product' get_company 'http://example.com/product' parse_product :html do |page,doc| id = mongo_out_product {title: doc.at('h1').text, price: doc.at('.price').text } record = mongo_find_product id record[:description] = doc.at('.desc').text mongo_out_product record log "Count: #{mongo_product_collection.count}" end parse_company :html do |page,doc| mongo_out_company {name: doc.at('h1').text, phone: doc.at('.phone').text } end end
@author Ivan Goncharov
Attributes
Default mongo collection suffix name
Public Class Methods
Default mongo collection suffix name
@return [String] Default mongo collection suffix name
# File lib/scruber/core/extensions/mongo_output.rb, line 111 def default_suffix_name @default_suffix_name ||= 'records' end
Access to mongo collection
@param scraper_name [String] name of scraper to build collection name @param suffix [String] suffix to build collection name
@return [Mongo::Collection] instance of Mongo::Collection
# File lib/scruber/core/extensions/mongo_output.rb, line 160 def mongo_collection(scraper_name, suffix) Scruber::Mongo.client[out_collection_name(scraper_name, suffix)] end
Searching document in mongo
@param scraper_name [String] name of scraper to build collection name @param suffix [String] suffix to build collection name @param id [Object] id of document
@return [Hash] document
# File lib/scruber/core/extensions/mongo_output.rb, line 145 def mongo_find(scraper_name, suffix, id) if id.is_a?(Hash) Scruber::Mongo.client[out_collection_name(scraper_name, suffix)].find(id) else Scruber::Mongo.client[out_collection_name(scraper_name, suffix)].find({_id: id}).first end end
Writing results to mongo collection
@param scraper_name [String] name of scraper to build collection name @param suffix [String] suffix to build collection name @param fields [Hash] Document to output @param options [Hash] Options for updating record (when _id not set), see docs.mongodb.com/manual/reference/method/db.collection.findOneAndUpdate/
@return [type] [description]
# File lib/scruber/core/extensions/mongo_output.rb, line 124 def mongo_out(scraper_name, suffix, fields, options={}) fields = fields.with_indifferent_access if fields[:_id].blank? Scruber::Mongo.client[out_collection_name(scraper_name, suffix)].insert_one(fields).inserted_id else Scruber::Mongo.client[out_collection_name(scraper_name, suffix)].find_one_and_update( {"_id" => fields[:_id] }, {'$set' => fields }, {return_document: :after, upsert: true}.merge(options) )[:_id] end end
Collection name builder
@param scraper_name [String] name of scraper to build collection name @param suffix [String] suffix to build collection name
@return [String] name of collection for given scraper_name and suffix
# File lib/scruber/core/extensions/mongo_output.rb, line 171 def out_collection_name(scraper_name, suffix) [Scruber::Mongo.configuration.options['collections_prefix'], scraper_name, suffix].select(&:present?).map(&:to_s).join('_') end