class Care::Cache

Stores cached pages of data from the given IO as strings. Pages are sized to be `page_size` or less (for the last page).

Public Class Methods

new(page_size = DEFAULT_PAGE_SIZE) click to toggle source

Initializes a new cache pages container with pages of given size

# File lib/care.rb, line 80
def initialize(page_size = DEFAULT_PAGE_SIZE)
  @page_size = page_size.to_i
  raise ArgumentError, 'The page size must be a positive Integer' unless @page_size > 0
  @pages = {}
  @lowest_known_empty_page = nil
end

Public Instance Methods

byteslice(io, at, n_bytes) click to toggle source

Returns the maximum possible byte string that can be recovered from the given `io` at the given offset. If the IO has been exhausted, `nil` will be returned instead. Will use the cached pages where available, or fetch pages where necessary

@param io[#seek, read] the IO to read data from @param at at which offset we have to read @param n_bytes how many bytes we want to read/cache @return [String, nil] the content read from the IO or `nil` if no data was available @raise ArgumentError

# File lib/care.rb, line 98
def byteslice(io, at, n_bytes)
  if n_bytes < 1
    raise ArgumentError, "The number of bytes to fetch must be a positive Integer, but was #{n_bytes}"
  end
  if at < 0
    raise ArgumentError, "Negative offsets are not supported (got #{at})"
  end

  first_page = at / @page_size
  last_page = (at + n_bytes) / @page_size

  relevant_pages = (first_page..last_page).map { |i| hydrate_page(io, i) }

  # Create one string combining all the pages which are relevant for
  # us - it is much easier to address that string instead of piecing
  # the output together page by page, and joining arrays of strings
  # is supposed to be optimized.
  slab = if relevant_pages.length > 1
    # If our read overlaps multiple pages, we do have to join them, this is
    # the general case
    relevant_pages.join
  else # We only have one page
    # Optimize a little. If we only have one page that we need to read from
    # - which is likely going to be the case *often* we can avoid allocating
    # a new string for the joined pages and juse use the only page
    # directly as the slab. Since it might contain a `nil` and we do
    # not join (which casts nils to strings) we take care of that too
    relevant_pages.first || ''
  end

  offset_in_slab = at % @page_size
  slice = slab.byteslice(offset_in_slab, n_bytes)

  # Returning an empty string from read() is very confusing for the caller,
  # and no builtins do this - if we are at EOF we should return nil
  slice if slice && !slice.empty?
end
clear() click to toggle source

Clears the page cache of all strings with data

@return void

# File lib/care.rb, line 139
def clear
  @pages.map { |maybe_page_str| maybe_page_str.clear if maybe_page_str.respond_to?(:clear) }
  @pages.clear
end
hydrate_page(io, page_i) click to toggle source

Hydrates a page at the certain index or returns the contents of that page if it is already in the cache

@param io the IO to read from @param page_i which page (zero-based) to hydrate and return

# File lib/care.rb, line 149
def hydrate_page(io, page_i)
  # Avoid trying to read the page if we know there is no content to fill it
  # in the underlying IO
  return if @lowest_known_empty_page && page_i >= @lowest_known_empty_page

  @pages[page_i] ||= read_page(io, page_i)
end
inspect() click to toggle source

We provide an overridden implementation of inspect to avoid printing the actual contents of the cached pages

# File lib/care.rb, line 159
def inspect
  # Simulate the builtin object ID output https://stackoverflow.com/a/11765495/153886
  oid_str = (object_id << 1).to_s(16).rjust(16, '0')

  ivars = instance_variables
  ivars.delete(:@pages)
  ivars_str = ivars.map do |ivar|
    "#{ivar}=#{instance_variable_get(ivar).inspect}"
  end.join(' ')
  synthetic_vars = 'num_hydrated_pages=%d' % @pages.length
  '#<%s:%s %s %s>' % [self.class, oid_str, synthetic_vars, ivars_str]
end
read_page(io, page_i) click to toggle source

Reads the requested page from the given IO

@param io the IO to read from @param page_i which page (zero-based) to read

# File lib/care.rb, line 176
def read_page(io, page_i)
  Measurometer.increment_counter('format_parser.parser.Care.page_reads_from_upsteam', 1)

  io.seek(page_i * @page_size)
  read_result = Measurometer.instrument('format_parser.Care.read_page') { io.read(@page_size) }
  if read_result.nil?
    # If the read went past the end of the IO the read result will be nil,
    # so we know our IO is exhausted here
    if @lowest_known_empty_page.nil? || @lowest_known_empty_page > page_i
      @lowest_known_empty_page = page_i
    end
  elsif read_result.bytesize < @page_size
    # If we read less than we initially wanted we know there are no pages
    # to read following this one, so we can also optimize
    @lowest_known_empty_page = page_i + 1
  end

  read_result
end