class Care::Cache
Stores cached pages of data from the given IO as strings. Pages are sized to be `page_size` or less (for the last page).
Public Class Methods
Initializes a new cache pages container with pages of given size
# File lib/care.rb, line 80 def initialize(page_size = DEFAULT_PAGE_SIZE) @page_size = page_size.to_i raise ArgumentError, 'The page size must be a positive Integer' unless @page_size > 0 @pages = {} @lowest_known_empty_page = nil end
Public Instance Methods
Returns the maximum possible byte string that can be recovered from the given `io` at the given offset. If the IO has been exhausted, `nil` will be returned instead. Will use the cached pages where available, or fetch pages where necessary
@param io[#seek, read] the IO to read data from @param at at which offset we have to read @param n_bytes how many bytes we want to read/cache @return [String, nil] the content read from the IO or `nil` if no data was available @raise ArgumentError
# File lib/care.rb, line 98 def byteslice(io, at, n_bytes) if n_bytes < 1 raise ArgumentError, "The number of bytes to fetch must be a positive Integer, but was #{n_bytes}" end if at < 0 raise ArgumentError, "Negative offsets are not supported (got #{at})" end first_page = at / @page_size last_page = (at + n_bytes) / @page_size relevant_pages = (first_page..last_page).map { |i| hydrate_page(io, i) } # Create one string combining all the pages which are relevant for # us - it is much easier to address that string instead of piecing # the output together page by page, and joining arrays of strings # is supposed to be optimized. slab = if relevant_pages.length > 1 # If our read overlaps multiple pages, we do have to join them, this is # the general case relevant_pages.join else # We only have one page # Optimize a little. If we only have one page that we need to read from # - which is likely going to be the case *often* we can avoid allocating # a new string for the joined pages and juse use the only page # directly as the slab. Since it might contain a `nil` and we do # not join (which casts nils to strings) we take care of that too relevant_pages.first || '' end offset_in_slab = at % @page_size slice = slab.byteslice(offset_in_slab, n_bytes) # Returning an empty string from read() is very confusing for the caller, # and no builtins do this - if we are at EOF we should return nil slice if slice && !slice.empty? end
Clears the page cache of all strings with data
@return void
# File lib/care.rb, line 139 def clear @pages.map { |maybe_page_str| maybe_page_str.clear if maybe_page_str.respond_to?(:clear) } @pages.clear end
Hydrates a page at the certain index or returns the contents of that page if it is already in the cache
@param io the IO to read from @param page_i which page (zero-based) to hydrate and return
# File lib/care.rb, line 149 def hydrate_page(io, page_i) # Avoid trying to read the page if we know there is no content to fill it # in the underlying IO return if @lowest_known_empty_page && page_i >= @lowest_known_empty_page @pages[page_i] ||= read_page(io, page_i) end
We provide an overridden implementation of inspect
to avoid printing the actual contents of the cached pages
# File lib/care.rb, line 159 def inspect # Simulate the builtin object ID output https://stackoverflow.com/a/11765495/153886 oid_str = (object_id << 1).to_s(16).rjust(16, '0') ivars = instance_variables ivars.delete(:@pages) ivars_str = ivars.map do |ivar| "#{ivar}=#{instance_variable_get(ivar).inspect}" end.join(' ') synthetic_vars = 'num_hydrated_pages=%d' % @pages.length '#<%s:%s %s %s>' % [self.class, oid_str, synthetic_vars, ivars_str] end
Reads the requested page from the given IO
@param io the IO to read from @param page_i which page (zero-based) to read
# File lib/care.rb, line 176 def read_page(io, page_i) Measurometer.increment_counter('format_parser.parser.Care.page_reads_from_upsteam', 1) io.seek(page_i * @page_size) read_result = Measurometer.instrument('format_parser.Care.read_page') { io.read(@page_size) } if read_result.nil? # If the read went past the end of the IO the read result will be nil, # so we know our IO is exhausted here if @lowest_known_empty_page.nil? || @lowest_known_empty_page > page_i @lowest_known_empty_page = page_i end elsif read_result.bytesize < @page_size # If we read less than we initially wanted we know there are no pages # to read following this one, so we can also optimize @lowest_known_empty_page = page_i + 1 end read_result end