module Distance

Distance module holds utility functionality for similarity computations of TLSH hashes.

Constants

BIT_PAIRS_DIFF_TABLE

BIT_PAIRS_DIFF_TABLE is pre-calculated table that represents approximation to the Hamming distance. It's generated using Jonathan Oliver's algorithm.

Original implementation and algorithm for generation can be found on following urls: github.com/trendmicro/tlsh/blob/master/src/tlsh_util.cpp#L84-L4694 github.com/trendmicro/tlsh/blob/master/src/gen_arr2.cpp#L1-L91

Source of the data can be found at: github.com/glaslos/tlsh

Details about distance score can be also found in Trendmicro TLSH paper: github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf

Public Class Methods

diff_total(a, b, is_len_diff) click to toggle source

diff_total calculates diff between two Tlsh hashes a and b for hash header and body

# File lib/tlsh/distance/distance.rb, line 5
def diff_total(a, b, is_len_diff)
  return -1 unless a.comparable? && b.comparable?
  compute_diff(a, b, is_len_diff)
end

Private Class Methods

compute_diff(a, b, is_len_diff) click to toggle source
# File lib/tlsh/distance/distance.rb, line 12
def compute_diff(a, b, is_len_diff)
  diff = 0

  if is_len_diff
    len_diff = mod_diff(a.l_value, b.l_value, 256)
    diff += length_diff(len_diff)
  end

  diff += q_diff(a.q1_ratio, b.q1_ratio)
  diff += q_diff(a.q2_ratio, b.q2_ratio)
  diff += 1 if a.checksum != b.checksum
  diff + digest_distance(a.body, b.body)
end
digest_distance(x, y) click to toggle source

digest_distance calculates distance between two hash digests

# File lib/tlsh/distance/distance.rb, line 53
def digest_distance(x, y)
  diff = 0
  x.zip(y).each do |a, b|
    diff += BIT_PAIRS_DIFF_TABLE[a][b]
  end
  diff
end
length_diff(len_diff) click to toggle source
# File lib/tlsh/distance/distance.rb, line 26
def length_diff(len_diff)
  return len_diff if len_diff < 1
  len_diff * 12
end
mod_diff(x, y, r) click to toggle source

mod_diff calculates steps from byte string x to byte string y in circular queue of size R.

# File lib/tlsh/distance/distance.rb, line 41
def mod_diff(x, y, r)
  if y > x
    dl = y - x
    dr = x + r - y
  else
    dl = x - y
    dr = y + r - x
  end
  dl > dr ? dr : dl
end
q_diff(a_ratio, b_ratio) click to toggle source
# File lib/tlsh/distance/distance.rb, line 31
def q_diff(a_ratio, b_ratio)
  diff = mod_diff(a_ratio, b_ratio, 16)
  if diff <= 1
    diff
  else
    (diff - 1) * 12
  end
end