class Statlysis::HotestItems
一般最近热门列表通常采用简单对一个字段记录访问数的算法,但是这可能会导致刷量等问题。
解决方法为从用户行为中去综合分析,具体流程为: 从URI中抽取item_id, 从访问日志抽取排重IP和user_id,从like,fav,comment表获取更深的用户行为,把前两者通过一定比例相加得到排行。 最后用时间降温来避免马太效应,必可动态提升比例以使最近稍微热门的替换掉之前太热门的。
线性计算速度很快
Attributes
id_to_score_and_time_hash_proc[RW]
key[RW]
limit[RW]
Public Class Methods
new(key, id_to_score_and_time_hash_proc)
click to toggle source
Calls superclass method
Statlysis::SingleKv::new
# File lib/statlysis/cron/top/hotest_items.rb, line 16 def initialize key, id_to_score_and_time_hash_proc cron.key = key cron.id_to_score_and_time_hash_proc = id_to_score_and_time_hash_proc cron.limit = 20 super cron end
Public Instance Methods
output()
click to toggle source
# File lib/statlysis/cron/top/hotest_items.rb, line 24 def output t = cron.id_to_score_and_time_hash_proc while t.is_a?(Proc) do t = t.call end @id_to_score_and_time_hash = t @id_to_day_hash = @id_to_score_and_time_hash.inject({}) {|h, ab| h[ab[0]] = (((Time.now - ab[1][1]) / (3600*24)).round + 1); h } @id_to_timecooldown_hash = @id_to_score_and_time_hash.inject({}) {|h, kv| h[kv[0]] = (kv[1][0] / Math.sqrt(@id_to_day_hash[kv[0]])); h } array = @id_to_timecooldown_hash.sort {|a, b| b[1] <=> a[1] }.map(&:first) {cron.key => array} end
write()
click to toggle source
# File lib/statlysis/cron/top/hotest_items.rb, line 37 def write cron.output.each do |key, array| json = array[0..140].to_json StSingleKv.find_or_create(:pattern => key).update :result => json StSingleKvHistory.find_or_create(:pattern => "#{key}_#{Time.now.strftime('%Y%m%d')}").update :result => json end end