class NumerousClientInternals

NumerousClientInternals

Handles details of talking to the numerousapp.com server, including the (basic) authentication, handling chunked APIs, json vs multipart APIs, fixing up a few server response quirks, and so forth. It is not meant for use outside of Numerous and NumerousMetric.

Constants

BChars

compute a multipart boundary string; excessively paranoid

BCharsLen
MethMap
ThrottleDefault

The default throttle policy. Invoked after the response has been received and we are supposed to return true to force a retry or false to accept this response as-is.

The policy this implements:

* if "getting close" to the limit, arbitrarily delay ourselves.

* if we truly got spanked with "Too Many Requests"
  then delay the amount of time the server told us to delay.

The Voluntary delay policy works like this:

Given N API calls remaining and T seconds until fresh allotment, compute a N-per-T rate delay so the hard rate limit probably won’t hit (there is no guarantee bcs multiple clients can be running).

Example: 20 APIs remaining and 5 seconds until fresh allocation. A delay of 250msec per API ensures we (approximately) don’t hit the limit. Always remember the point here is just to TRY to be NICE. It’s not important to be fussy about exactness.

In effect the concept is to “smear” an inevitable rate-limit delay over the tail end of the API rate allocation rather than hitting the hard limit and encountering a long (e.g., 30 second) hard delay.

When there are only a few APIs left and a lot of time, this could impose long delays. E.g., rateleft 2, but 40 seconds to go until fresh. Although this “shouldn’t” happen if you have a single thread using this smear algorithm, it can certainly happen with multiple threads or multiple processes all individually consuming APIs. In this scenario you’re going to inevitably hit the hard cap anyway. Therefore: voluntary delay is arbitrarily capped to a parameter provided in the throttledata (set up during initialization)

This has been stress-tested “in the wild” by running code doing a metric.read() in a loop; theoretically such code should run at 300 API calls per minute – and it does, either with this voluntary throttling or without it. If you are trying to run faster than 300 per minute, it’s all just a question of how you want to experience your (ultimately server-imposed) API throttling, not if (or how much).

Speed Limit: 300 API/minute. It’s The Law. :)

The arguments supplied to us are:

nr is the Numerous
tparams is a Hash containing:
    :attempt         : the attempt number. Zero on the very first try
    :rateRemaining   : X-Rate-Limit-Remaining reported by the server
    :rateReset       : time (in seconds) until fresh rate granted
    :resultCode      : HTTP code from server (e.g., 409, 200, etc)
    :resp            : the full-on response object
    :request         : information about the original request
    :statistics      : place to record informational stats
    :debug           : current debug level

td is the data you supplied as "throttleData" to Numerous.new()
up is a tuple useful for calling the original system throttle policy:
     up[0] is the Proc
     up[1] is the td for *that* function
     up[2] is the "up" for calling *that* function
  ... so after you do your own thing if you then want to defer to the
      built-in throttle policy you can
                up[0].call(nr, tparams, up[1], up[2])

It’s really (really really) important to understand the return value and the fact that we are invoked AFTER each request:

false : means "don't do more retries". It does not imply anything
        about the success or failure of the request; it simply means
        this most recent request (response) is the one to use as
        the final answer

true  : means that the response is, indeed, to be interpreted as some
        sort of rate-limit failure and should be discarded. The
        original request will be sent again. Obviously it's a very
        bad idea to return true in cases where the server might
        have done anything non-idempotent.

All of this seems overly general for what amounts to “sleep sometimes”

VersionString

Attributes

agentString[RW]
debugLevel[R]
serverName[R]
statistics[R]

Public Class Methods

new(apiKey=nil, server:'api.numerousapp.com', throttle:nil, throttleData:nil) click to toggle source

@param apiKey [String] API authentication key @param server [String] Optional (keyword arg). Server name. @param throttle [Proc] Optional throttle policy @param throttleData [Any] Optional data for throttle

@!attribute agentString

@return [String] User agent string sent to the server.

@!attribute [r] serverName

@return [String] FQDN of the target NumerousApp server.

@!attribute [r] debugLevel

@return [Fixnum] Current debugging level; change via debug() method
# File lib/numerousapp.rb, line 126
def initialize(apiKey=nil, server:'api.numerousapp.com',
                       throttle:nil, throttleData:nil)

    # specifying apiKey=nil asks us to get key from various default places.
    if not apiKey
        apiKey = Numerous.numerousKey()
    end

    @auth = { user: apiKey, password: "" }
    u = URI.parse("https://"+server)
    @serverName = server
    @serverPort = u.port

    @agentString = "NW-Ruby-NumerousClass/" + VersionString +
                   " (Ruby #{RUBY_VERSION}) NumerousAPI/v2"

    @filterDuplicates = true     # see discussion elsewhere
    @need_restart = true         # port will be opened in simpleAPI

    # Throttling:
    #
    # arbitraryMaximum is just that: under no circumstances will we retry
    # any particular request more than that. Tough noogies.
    #
    # throttlePolicy "tuple" is:
    #     [ 0 ] - Proc
    #     [ 1 ] - specific data for Proc
    #     [ 2 ] - "up" tuple for chained policy
    #
    # and the default policy uses the "data" as a hash of parameters:
    #    :voluntary -- the threshold point for voluntary backoff
    #    :volmaxdelay -- arbitrary maximum *voluntary delay* time
    #
    @arbitraryMaximumTries = 10
    voluntary = { voluntary: 40, volmaxdelay: 5}
    # you can keep the dflt throttle but just alter the voluntary
    # parameters, this way:
    if throttleData and not throttle
        voluntary = voluntary.merge(throttleData)
    end
    @throttlePolicy = [ThrottleDefault, voluntary, nil]
    if throttle
        @throttlePolicy = [throttle, throttleData, @throttlePolicy]
    end

    @statistics = Hash.new { |h, k| h[k] = 0 }  # stats are "infotainment"
    @debugLevel = 0

end

Public Instance Methods

debug(lvl=1) click to toggle source

Set the debug level

@param [Fixnum] lvl

The desired debugging level. Greater than zero turns on debugging.

@return [Fixnum] the previous debugging level.

# File lib/numerousapp.rb, line 193
def debug(lvl=1)
    prev = @debugLevel
    @debugLevel = lvl
    # need to make sure we have started an http session
    # (normally deferred until first API call)

    if @debugLevel > 0
        # this is hokey, but it is what it is... it's for debug anyway
        # we have to restart the session with debug on
        @http = Net::HTTP.new(@serverName, @serverPort)
        @http.use_ssl = true    # always required by NumerousApp
        @http.set_debug_output $stderr
        @http = @http.start()
        @need_restart = false
    else
        @need_restart = true   # will force a new http session
    end
    return prev
end
setBogusDupFilter(f) click to toggle source

This is primarily for testing; control filtering of bogus duplicates @note If you are calling this you are probably doing something wrong.

@param [Boolean] f

New value for duplicate filtering flag.

@return [Boolean] Previous value of duplicate filtering flag.

# File lib/numerousapp.rb, line 220
def setBogusDupFilter(f)
    prev = @filterDuplicates
    @filterDuplicates = f
    return prev
end
to_s() click to toggle source

String representation of Numerous

@return [String] Human-appropriate string representation.

# File lib/numerousapp.rb, line 183
def to_s()
    oid = (2 * self.object_id).to_s(16)  # XXX "2*" matches native to_s
    return "<Numerous {#{@serverName}} @ 0x#{oid}>"
end

Protected Instance Methods

chunkedIterator(info, subs={}, block) click to toggle source

generic iterator for chunked APIs

# File lib/numerousapp.rb, line 520
def chunkedIterator(info, subs={}, block)
    # if you didn't specify a block... there's no point in doing anything
    if not block; return nil; end

    api = makeAPIcontext(info, :GET, subs)
    list = []
    nextURL = api[:basePath]
    firstTime = true

    # see discussion about duplicate filtering below
    if @filterDuplicates and api[:dupFilter]
        filterInfo = { prev: {}, current: {} }
    else
        filterInfo = nil
    end

    while nextURL
        # get a chunk from the server

        # XXX in the python version we caught various exceptions and
        #     attempted to translate them into something meaningful
        #     (e.g., if a metric got deleted while you were iterating)
        #     But here we're just letting the whatever-exceptions filter up
        v = simpleAPI(api, url:nextURL)

        # statistics, helpful for testing/debugging. Algorithmically
        # we don't really care about first time or not, just for the stats
        if firstTime
            @statistics[:firstChunks] += 1
            firstTime = false
        else
            @statistics[:additionalChunks] += 1
        end

        if filterInfo
            filterInfo[:prev] = filterInfo[:current]
            filterInfo[:current] = {}
        end

        list = v[api[:list]]
        nextURL = v[api[:next]]

        # hand them out
        if list             # can be nil for a variety of reasons
            list.each do |i|

                # A note about duplicate filtering
                #
                # There is a bug in the NumerousApp server which can
                # cause collections to show duplicates of certain events
                # (or interactions/stream items). Explaining the bug in
                # great detail is beyond the scope here; suffice to say
                # it only happens for events that were recorded
                # nearly-simultaneously and happen to be getting reported
                # right at a chunking boundary.
                #
                # So we are filtering them out here. For a more involved
                # discussion of this, see the python implementation. This
                # filtering "works" because it knows pragmatically
                # how/where the bug can show up
                #
                # Turning off duplicate filtering is for testing (only).
                #
                # Not all API's need dupfiltering, hence the APIInfo test
                #
                if (not filterInfo)    # the easy case, not filtering
                    block.call i
                else
                    thisId = i[api[:dupFilter]]
                    if filterInfo[:prev].include? thisId
                        @statistics[:duplicatesFiltered] += 1
                    else
                        filterInfo[:current][thisId] = 1
                        block.call i
                    end
                end
            end
        end
    end
    return nil     # the subclasses return (should return) their own self
end
getRedirect(url) click to toggle source

This is a special case … a bit of a hack … to determine the underlying (redirected-to) URL for metric photos. The issue is that sometimes we want to get at the no-auth-required actual image URL (vs the metric API endpoint for getting a photo)

This does that by (unfortunately) getting the actual image and then using the r.url feature of requests library to get at what the final (actual/real) URL was.

# File lib/numerousapp.rb, line 509
def getRedirect(url)
    rq = MethMap[:GET].new(url)
    rq.basic_auth(@auth[:user], @auth[:password])
    rq['user-agent'] = @agentString

    resp = @http.request(rq)
    return resp.header['Location']
end
makeAPIcontext(info, whichOp, kwargs={}) click to toggle source

This gathers all the relevant information for a given API and fills in the variable fields in URLs. It returns an “api context” containing all the API-specific details needed by simpleAPI.

# File lib/numerousapp.rb, line 243
def makeAPIcontext(info, whichOp, kwargs={})
    rslt = {}
    rslt[:httpMethod] = whichOp

    # Build the substitutions from defaults (if any) and non-nil kwargs.
    # Note: we are carefully making copies of the underlying dictionaries
    #       so you get your own private context returned to you
    substitutions = (info[:defaults]||{}).clone

    # copy any supplied non-nil kwargs (nil ones defer to defaults)
    kwargs.each { |k, v| if v then substitutions[k] = v end }

    # this is the stuff specific to the operation, e.g.,
    # the 'next' and 'list' fields in a chunked GET
    # There can also be additional path info.
    # process the paty appendage and copy everything else

    appendThis = ""
    path = info[:path]
    if info[whichOp]
        opi = info[whichOp]
        opi.each do |k, v|
            if k == :appendPath
                appendThis = v
            elsif k == :path
                path = v           # entire path overridden on this one
            else
                rslt[k] = v
            end
        end
    end
    rslt[:basePath] = (path + appendThis) % substitutions
    return rslt
end
simpleAPI(api, jdict:nil, multipart:nil, url:nil) click to toggle source

ALL api exchanges with the Numerous server go through here except for getRedirect() which is a special case (hack) for photo URLs

Any single request/response uses this; chunked APIs use the iterator classes (which in turn come back and use this repeatedly)

The api parameter dictionary specifies:

basePath - the url we use (without the https://server.com part)
httpMethod' - GET vs POST vs PUT etc
successCodes' - what "OK" responses are (default 200)

The api parameter may also carry additional info used elsewhere. See, for example, how the iterators work on collections.

Sometimes you may have started with a basePath but then been given a “next” URL to use for subsequent requests. In those cases pass in a url and it will take precedence over the basePath if any is present

You can pass in a dictionary jdict which will be json-ified and sent as Content-Type: application/json. Or you can pass in a multipart dictionary … this is used for posting photos You should not specify both jdict and multipart

# File lib/numerousapp.rb, line 328
def simpleAPI(api, jdict:nil, multipart:nil, url:nil)

    @statistics[:simpleAPI] += 1

    # take the base url if you didn't give us an override
    url ||= api[:basePath]

    if url[0] == '/'                  # i.e. not "http..."
        path = url
    else
        # technically we should be able to reassign @http bcs it could
        # change if server redirected us. But don't want to if no change.
        # need to add logic. XXX TODO XXX
        path = URI.parse(url).request_uri
    end

    rq = MethMap[api[:httpMethod]].new(path)
    rq.basic_auth(@auth[:user], @auth[:password])
    rq['user-agent'] = @agentString
    if jdict
        rq['content-type'] = 'application/json'
        rq.body = JSON.generate(jdict)
    elsif multipart
        # the data in :f is either a raw string OR a readable file
        begin
            f = multipart[:f]
            img = f.read
        rescue NoMethodError
            img = f
        end
        boundary = makeboundary(img)

        rq["content-type"] = "multipart/form-data; boundary=#{boundary}"
        d = []
        d << "--#{boundary}\r\n"
        d << "Content-Disposition: form-data;"
        d << ' name="image";'
        d << ' filename="image.img";'
        d << "\r\n"
        d << "Content-Transfer-Encoding: binary\r\n"
        d << "Content-Type: #{multipart[:mimeType]}\r\n"
        d << "\r\n"
        d << img + "\r\n"
        d << "--#{boundary}--\r\n"
        rq.body = d.join
    end

    if @debugLevel > 0
        puts "Path: #{path}\n"
        puts "Request headers:\n"
        rq.each do | k, v |
            puts "k: " + k + " :: " + v + "\n"
        end
    end

    resp = nil   # ick, is there a better way to get this out of the block?
    @arbitraryMaximumTries.times do |attempt|

        @statistics[:serverRequests] += 1
        t0 = Time.now
        begin

            # see note immediately below re "need_restart"
            # this is where the very FIRST start() happens but
            # also where subsequent starts might be re-done after errors

            if @need_restart or not @http.started?()
                @http = Net::HTTP.new(@serverName, @serverPort)
                @http.use_ssl = true    # always required by NumerousApp
                @http = @http.start()
            end

            # A note on this need_restart true/false dance:
            #    I have to admit I'm not sure if this is necessary or if
            #    it is even a good/bad/irrelevant idea, but the concept
            #    here is that if an error is encountered we'll ditch
            #    the current start() session and make a new one next time

            @need_restart = true    # will redo session if raise out
            resp = @http.request(rq)
            @need_restart = false

        rescue StandardError => e
            # it's PDB (pretty bogus) that we have to rescue
            # StandardError but the underlying http library can just throw
            # too many exceptions to know what they all are; it really
            # should have encapsulated them into an HTTPNetError class...
            # so, we'll just assume any "standard error" is a network issue
            raise NumerousNetworkError.new(e)
        end
        et = Time.now - t0
        # We report the elapsed round-trip time, as a scalar (by default)
        # OR if you preset the :serverResponseTimes to be an array
        # of length N then we keep the last N response times, thusly:
        begin
            times = @statistics[:serverResponseTimes]
            times.insert(0, et)
            times.pop()
        rescue NoMethodError         # just a scalar
            @statistics[:serverResponseTimes] = et
        end

        if @debugLevel > 0
            puts "Response headers:\n"
            resp.each do | k, v |
                puts "k: " + k + " :: " + v + "\n"
            end
            puts "Code: " + resp.code + "/" + resp.code.class.to_s + "/\n"
        end

        # invoke the rate-limiting policy

        rateRemain = getElseM1(resp, 'x-rate-limit-remaining')
        rateReset = getElseM1(resp, 'x-rate-limit-reset')
        @statistics[:rateRemaining] = rateRemain
        @statistics[:rateReset] = rateReset

        tp = { :debug=> @debug,
               :attempt=> attempt,
               :rateRemaining=> rateRemain,
               :rateReset=> rateReset,
               :resultCode=> resp.code.to_i,
               :resp=> resp,
               :statistics=> @statistics,
               :request=> { :httpMethod => api[:httpMethod],
                             :url => path,
                             :jdict => jdict }
             }

        td = @throttlePolicy[1]
        up = @throttlePolicy[2]
        if not @throttlePolicy[0].call(self, tp, td, up)
            break
        end
    end

    goodCodes = api[:successCodes] || [200]

    responseCode = resp.code.to_i

    if goodCodes.include? responseCode
        begin
            rj = JSON.parse(resp.body)
        rescue TypeError, JSON::ParserError => e
            # On some requests that return "nothing" the server
            # returns {} ... on others it literally returns nothing.
            if (not resp.body) or resp.body.length == 0
                rj = {}
            else
                # this isn't supposed to happen... server bug?
                raise e
            end
        end
    else
        rj = { errorType: "HTTPError" }
        rj[:code] = responseCode
        rj[:reason] = resp.message
        rj[:value] = "Server returned an HTTP error: #{resp.message}"
        rj[:id] = url
        if responseCode == 401     # XXX is there an HTTP constant for this?
            emeth = NumerousAuthError
        else
            emeth = NumerousError
        end

        raise emeth.new(rj[:value],responseCode, rj)

    end

    return rj
end

Private Instance Methods

getElseM1(d, k) click to toggle source

helper function to extract header field integer or return -1

# File lib/numerousapp.rb, line 294
def getElseM1(d, k)
    if d.key? k
        return d[k].to_i
    else
        return -1
    end
end
makeboundary(s) click to toggle source
# File lib/numerousapp.rb, line 282
def makeboundary(s)
    # Just try something fixed, and if it is no good extend it with random.
    # For amusing porpoises make it this way so we don't also contain it.
    b = "RoLlErCaSeDbOuNdArY867".b + "5309".b
    while s.include? b
        b += BChars[rand(BCharsLen)]
    end
    return b
end