module Distance
Distance
module holds utility functionality for similarity computations of TLSH hashes.
Constants
- BIT_PAIRS_DIFF_TABLE
BIT_PAIRS_DIFF_TABLE
is pre-calculated table that represents approximation to the Hamming distance. It's generated using Jonathan Oliver's algorithm.Original implementation and algorithm for generation can be found on following urls: github.com/trendmicro/tlsh/blob/master/src/tlsh_util.cpp#L84-L4694 github.com/trendmicro/tlsh/blob/master/src/gen_arr2.cpp#L1-L91
Source of the data can be found at: github.com/glaslos/tlsh
Details about distance score can be also found in Trendmicro TLSH paper: github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf
Public Class Methods
diff_total
calculates diff between two Tlsh
hashes a and b for hash header and body
# File lib/tlsh/distance/distance.rb, line 5 def diff_total(a, b, is_len_diff) return -1 unless a.comparable? && b.comparable? compute_diff(a, b, is_len_diff) end
Private Class Methods
# File lib/tlsh/distance/distance.rb, line 12 def compute_diff(a, b, is_len_diff) diff = 0 if is_len_diff len_diff = mod_diff(a.l_value, b.l_value, 256) diff += length_diff(len_diff) end diff += q_diff(a.q1_ratio, b.q1_ratio) diff += q_diff(a.q2_ratio, b.q2_ratio) diff += 1 if a.checksum != b.checksum diff + digest_distance(a.body, b.body) end
digest_distance
calculates distance between two hash digests
# File lib/tlsh/distance/distance.rb, line 53 def digest_distance(x, y) diff = 0 x.zip(y).each do |a, b| diff += BIT_PAIRS_DIFF_TABLE[a][b] end diff end
# File lib/tlsh/distance/distance.rb, line 26 def length_diff(len_diff) return len_diff if len_diff < 1 len_diff * 12 end
mod_diff
calculates steps from byte string x to byte string y in circular queue of size R.
# File lib/tlsh/distance/distance.rb, line 41 def mod_diff(x, y, r) if y > x dl = y - x dr = x + r - y else dl = x - y dr = y + r - x end dl > dr ? dr : dl end
# File lib/tlsh/distance/distance.rb, line 31 def q_diff(a_ratio, b_ratio) diff = mod_diff(a_ratio, b_ratio, 16) if diff <= 1 diff else (diff - 1) * 12 end end