class NumerousClientInternals
NumerousClientInternals
¶ ↑
Handles details of talking to the numerousapp.com server, including the (basic) authentication, handling chunked APIs, json vs multipart APIs, fixing up a few server response quirks, and so forth. It is not meant for use outside of Numerous
and NumerousMetric
.
Constants
- BChars
compute a multipart boundary string; excessively paranoid
- BCharsLen
- MethMap
- ThrottleDefault
The default throttle policy. Invoked after the response has been received and we are supposed to return true to force a retry or false to accept this response as-is.
The policy this implements:
* if "getting close" to the limit, arbitrarily delay ourselves. * if we truly got spanked with "Too Many Requests" then delay the amount of time the server told us to delay.
The Voluntary delay policy works like this:
Given N API calls remaining and T seconds until fresh allotment, compute a N-per-T rate delay so the hard rate limit probably won’t hit (there is no guarantee bcs multiple clients can be running).
Example: 20 APIs remaining and 5 seconds until fresh allocation. A delay of 250msec per API ensures we (approximately) don’t hit the limit. Always remember the point here is just to TRY to be NICE. It’s not important to be fussy about exactness.
In effect the concept is to “smear” an inevitable rate-limit delay over the tail end of the API rate allocation rather than hitting the hard limit and encountering a long (e.g., 30 second) hard delay.
When there are only a few APIs left and a lot of time, this could impose long delays. E.g., rateleft 2, but 40 seconds to go until fresh. Although this “shouldn’t” happen if you have a single thread using this smear algorithm, it can certainly happen with multiple threads or multiple processes all individually consuming APIs. In this scenario you’re going to inevitably hit the hard cap anyway. Therefore: voluntary delay is arbitrarily capped to a parameter provided in the throttledata (set up during initialization)
This has been stress-tested “in the wild” by running code doing a metric.read() in a loop; theoretically such code should run at 300 API calls per minute – and it does, either with this voluntary throttling or without it. If you are trying to run faster than 300 per minute, it’s all just a question of how you want to experience your (ultimately server-imposed) API throttling, not if (or how much).
Speed Limit: 300 API/minute. It’s The Law. :)
The arguments supplied to us are:
nr is the Numerous tparams is a Hash containing: :attempt : the attempt number. Zero on the very first try :rateRemaining : X-Rate-Limit-Remaining reported by the server :rateReset : time (in seconds) until fresh rate granted :resultCode : HTTP code from server (e.g., 409, 200, etc) :resp : the full-on response object :request : information about the original request :statistics : place to record informational stats :debug : current debug level td is the data you supplied as "throttleData" to Numerous.new() up is a tuple useful for calling the original system throttle policy: up[0] is the Proc up[1] is the td for *that* function up[2] is the "up" for calling *that* function ... so after you do your own thing if you then want to defer to the built-in throttle policy you can up[0].call(nr, tparams, up[1], up[2])
It’s really (really really) important to understand the return value and the fact that we are invoked AFTER each request:
false : means "don't do more retries". It does not imply anything about the success or failure of the request; it simply means this most recent request (response) is the one to use as the final answer true : means that the response is, indeed, to be interpreted as some sort of rate-limit failure and should be discarded. The original request will be sent again. Obviously it's a very bad idea to return true in cases where the server might have done anything non-idempotent.
All of this seems overly general for what amounts to “sleep sometimes”
- VersionString
Attributes
Public Class Methods
@param apiKey [String] API authentication key @param server [String] Optional (keyword arg). Server name. @param throttle [Proc] Optional throttle policy @param throttleData [Any] Optional data for throttle
@!attribute agentString
@return [String] User agent string sent to the server.
@!attribute [r] serverName
@return [String] FQDN of the target NumerousApp server.
@!attribute [r] debugLevel
@return [Fixnum] Current debugging level; change via debug() method
# File lib/numerousapp.rb, line 126 def initialize(apiKey=nil, server:'api.numerousapp.com', throttle:nil, throttleData:nil) # specifying apiKey=nil asks us to get key from various default places. if not apiKey apiKey = Numerous.numerousKey() end @auth = { user: apiKey, password: "" } u = URI.parse("https://"+server) @serverName = server @serverPort = u.port @agentString = "NW-Ruby-NumerousClass/" + VersionString + " (Ruby #{RUBY_VERSION}) NumerousAPI/v2" @filterDuplicates = true # see discussion elsewhere @need_restart = true # port will be opened in simpleAPI # Throttling: # # arbitraryMaximum is just that: under no circumstances will we retry # any particular request more than that. Tough noogies. # # throttlePolicy "tuple" is: # [ 0 ] - Proc # [ 1 ] - specific data for Proc # [ 2 ] - "up" tuple for chained policy # # and the default policy uses the "data" as a hash of parameters: # :voluntary -- the threshold point for voluntary backoff # :volmaxdelay -- arbitrary maximum *voluntary delay* time # @arbitraryMaximumTries = 10 voluntary = { voluntary: 40, volmaxdelay: 5} # you can keep the dflt throttle but just alter the voluntary # parameters, this way: if throttleData and not throttle voluntary = voluntary.merge(throttleData) end @throttlePolicy = [ThrottleDefault, voluntary, nil] if throttle @throttlePolicy = [throttle, throttleData, @throttlePolicy] end @statistics = Hash.new { |h, k| h[k] = 0 } # stats are "infotainment" @debugLevel = 0 end
Public Instance Methods
Set the debug level
@param [Fixnum] lvl
The desired debugging level. Greater than zero turns on debugging.
@return [Fixnum] the previous debugging level.
# File lib/numerousapp.rb, line 193 def debug(lvl=1) prev = @debugLevel @debugLevel = lvl # need to make sure we have started an http session # (normally deferred until first API call) if @debugLevel > 0 # this is hokey, but it is what it is... it's for debug anyway # we have to restart the session with debug on @http = Net::HTTP.new(@serverName, @serverPort) @http.use_ssl = true # always required by NumerousApp @http.set_debug_output $stderr @http = @http.start() @need_restart = false else @need_restart = true # will force a new http session end return prev end
This is primarily for testing; control filtering of bogus duplicates @note If you are calling this you are probably doing something wrong.
@param [Boolean] f
New value for duplicate filtering flag.
@return [Boolean] Previous value of duplicate filtering flag.
# File lib/numerousapp.rb, line 220 def setBogusDupFilter(f) prev = @filterDuplicates @filterDuplicates = f return prev end
String representation of Numerous
@return [String] Human-appropriate string representation.
# File lib/numerousapp.rb, line 183 def to_s() oid = (2 * self.object_id).to_s(16) # XXX "2*" matches native to_s return "<Numerous {#{@serverName}} @ 0x#{oid}>" end
Protected Instance Methods
generic iterator for chunked APIs
# File lib/numerousapp.rb, line 520 def chunkedIterator(info, subs={}, block) # if you didn't specify a block... there's no point in doing anything if not block; return nil; end api = makeAPIcontext(info, :GET, subs) list = [] nextURL = api[:basePath] firstTime = true # see discussion about duplicate filtering below if @filterDuplicates and api[:dupFilter] filterInfo = { prev: {}, current: {} } else filterInfo = nil end while nextURL # get a chunk from the server # XXX in the python version we caught various exceptions and # attempted to translate them into something meaningful # (e.g., if a metric got deleted while you were iterating) # But here we're just letting the whatever-exceptions filter up v = simpleAPI(api, url:nextURL) # statistics, helpful for testing/debugging. Algorithmically # we don't really care about first time or not, just for the stats if firstTime @statistics[:firstChunks] += 1 firstTime = false else @statistics[:additionalChunks] += 1 end if filterInfo filterInfo[:prev] = filterInfo[:current] filterInfo[:current] = {} end list = v[api[:list]] nextURL = v[api[:next]] # hand them out if list # can be nil for a variety of reasons list.each do |i| # A note about duplicate filtering # # There is a bug in the NumerousApp server which can # cause collections to show duplicates of certain events # (or interactions/stream items). Explaining the bug in # great detail is beyond the scope here; suffice to say # it only happens for events that were recorded # nearly-simultaneously and happen to be getting reported # right at a chunking boundary. # # So we are filtering them out here. For a more involved # discussion of this, see the python implementation. This # filtering "works" because it knows pragmatically # how/where the bug can show up # # Turning off duplicate filtering is for testing (only). # # Not all API's need dupfiltering, hence the APIInfo test # if (not filterInfo) # the easy case, not filtering block.call i else thisId = i[api[:dupFilter]] if filterInfo[:prev].include? thisId @statistics[:duplicatesFiltered] += 1 else filterInfo[:current][thisId] = 1 block.call i end end end end end return nil # the subclasses return (should return) their own self end
This is a special case … a bit of a hack … to determine the underlying (redirected-to) URL for metric photos. The issue is that sometimes we want to get at the no-auth-required actual image URL (vs the metric API endpoint for getting a photo)
This does that by (unfortunately) getting the actual image and then using the r.url feature of requests library to get at what the final (actual/real) URL was.
# File lib/numerousapp.rb, line 509 def getRedirect(url) rq = MethMap[:GET].new(url) rq.basic_auth(@auth[:user], @auth[:password]) rq['user-agent'] = @agentString resp = @http.request(rq) return resp.header['Location'] end
This gathers all the relevant information for a given API and fills in the variable fields in URLs. It returns an “api context” containing all the API-specific details needed by simpleAPI.
# File lib/numerousapp.rb, line 243 def makeAPIcontext(info, whichOp, kwargs={}) rslt = {} rslt[:httpMethod] = whichOp # Build the substitutions from defaults (if any) and non-nil kwargs. # Note: we are carefully making copies of the underlying dictionaries # so you get your own private context returned to you substitutions = (info[:defaults]||{}).clone # copy any supplied non-nil kwargs (nil ones defer to defaults) kwargs.each { |k, v| if v then substitutions[k] = v end } # this is the stuff specific to the operation, e.g., # the 'next' and 'list' fields in a chunked GET # There can also be additional path info. # process the paty appendage and copy everything else appendThis = "" path = info[:path] if info[whichOp] opi = info[whichOp] opi.each do |k, v| if k == :appendPath appendThis = v elsif k == :path path = v # entire path overridden on this one else rslt[k] = v end end end rslt[:basePath] = (path + appendThis) % substitutions return rslt end
ALL api exchanges with the Numerous
server go through here except for getRedirect() which is a special case (hack) for photo URLs
Any single request/response uses this; chunked APIs use the iterator classes (which in turn come back and use this repeatedly)
The api parameter dictionary specifies:
basePath - the url we use (without the https://server.com part) httpMethod' - GET vs POST vs PUT etc successCodes' - what "OK" responses are (default 200)
The api parameter may also carry additional info used elsewhere. See, for example, how the iterators work on collections.
Sometimes you may have started with a basePath but then been given a “next” URL to use for subsequent requests. In those cases pass in a url and it will take precedence over the basePath if any is present
You can pass in a dictionary jdict which will be json-ified and sent as Content-Type: application/json. Or you can pass in a multipart dictionary … this is used for posting photos You should not specify both jdict and multipart
# File lib/numerousapp.rb, line 328 def simpleAPI(api, jdict:nil, multipart:nil, url:nil) @statistics[:simpleAPI] += 1 # take the base url if you didn't give us an override url ||= api[:basePath] if url[0] == '/' # i.e. not "http..." path = url else # technically we should be able to reassign @http bcs it could # change if server redirected us. But don't want to if no change. # need to add logic. XXX TODO XXX path = URI.parse(url).request_uri end rq = MethMap[api[:httpMethod]].new(path) rq.basic_auth(@auth[:user], @auth[:password]) rq['user-agent'] = @agentString if jdict rq['content-type'] = 'application/json' rq.body = JSON.generate(jdict) elsif multipart # the data in :f is either a raw string OR a readable file begin f = multipart[:f] img = f.read rescue NoMethodError img = f end boundary = makeboundary(img) rq["content-type"] = "multipart/form-data; boundary=#{boundary}" d = [] d << "--#{boundary}\r\n" d << "Content-Disposition: form-data;" d << ' name="image";' d << ' filename="image.img";' d << "\r\n" d << "Content-Transfer-Encoding: binary\r\n" d << "Content-Type: #{multipart[:mimeType]}\r\n" d << "\r\n" d << img + "\r\n" d << "--#{boundary}--\r\n" rq.body = d.join end if @debugLevel > 0 puts "Path: #{path}\n" puts "Request headers:\n" rq.each do | k, v | puts "k: " + k + " :: " + v + "\n" end end resp = nil # ick, is there a better way to get this out of the block? @arbitraryMaximumTries.times do |attempt| @statistics[:serverRequests] += 1 t0 = Time.now begin # see note immediately below re "need_restart" # this is where the very FIRST start() happens but # also where subsequent starts might be re-done after errors if @need_restart or not @http.started?() @http = Net::HTTP.new(@serverName, @serverPort) @http.use_ssl = true # always required by NumerousApp @http = @http.start() end # A note on this need_restart true/false dance: # I have to admit I'm not sure if this is necessary or if # it is even a good/bad/irrelevant idea, but the concept # here is that if an error is encountered we'll ditch # the current start() session and make a new one next time @need_restart = true # will redo session if raise out resp = @http.request(rq) @need_restart = false rescue StandardError => e # it's PDB (pretty bogus) that we have to rescue # StandardError but the underlying http library can just throw # too many exceptions to know what they all are; it really # should have encapsulated them into an HTTPNetError class... # so, we'll just assume any "standard error" is a network issue raise NumerousNetworkError.new(e) end et = Time.now - t0 # We report the elapsed round-trip time, as a scalar (by default) # OR if you preset the :serverResponseTimes to be an array # of length N then we keep the last N response times, thusly: begin times = @statistics[:serverResponseTimes] times.insert(0, et) times.pop() rescue NoMethodError # just a scalar @statistics[:serverResponseTimes] = et end if @debugLevel > 0 puts "Response headers:\n" resp.each do | k, v | puts "k: " + k + " :: " + v + "\n" end puts "Code: " + resp.code + "/" + resp.code.class.to_s + "/\n" end # invoke the rate-limiting policy rateRemain = getElseM1(resp, 'x-rate-limit-remaining') rateReset = getElseM1(resp, 'x-rate-limit-reset') @statistics[:rateRemaining] = rateRemain @statistics[:rateReset] = rateReset tp = { :debug=> @debug, :attempt=> attempt, :rateRemaining=> rateRemain, :rateReset=> rateReset, :resultCode=> resp.code.to_i, :resp=> resp, :statistics=> @statistics, :request=> { :httpMethod => api[:httpMethod], :url => path, :jdict => jdict } } td = @throttlePolicy[1] up = @throttlePolicy[2] if not @throttlePolicy[0].call(self, tp, td, up) break end end goodCodes = api[:successCodes] || [200] responseCode = resp.code.to_i if goodCodes.include? responseCode begin rj = JSON.parse(resp.body) rescue TypeError, JSON::ParserError => e # On some requests that return "nothing" the server # returns {} ... on others it literally returns nothing. if (not resp.body) or resp.body.length == 0 rj = {} else # this isn't supposed to happen... server bug? raise e end end else rj = { errorType: "HTTPError" } rj[:code] = responseCode rj[:reason] = resp.message rj[:value] = "Server returned an HTTP error: #{resp.message}" rj[:id] = url if responseCode == 401 # XXX is there an HTTP constant for this? emeth = NumerousAuthError else emeth = NumerousError end raise emeth.new(rj[:value],responseCode, rj) end return rj end
Private Instance Methods
helper function to extract header field integer or return -1
# File lib/numerousapp.rb, line 294 def getElseM1(d, k) if d.key? k return d[k].to_i else return -1 end end
# File lib/numerousapp.rb, line 282 def makeboundary(s) # Just try something fixed, and if it is no good extend it with random. # For amusing porpoises make it this way so we don't also contain it. b = "RoLlErCaSeDbOuNdArY867".b + "5309".b while s.include? b b += BChars[rand(BCharsLen)] end return b end