class UrlPrivacy

Usage:

UrlPrivacy.clean(url)

Constants

TRACKING_PARAMS

Remove these params from URLs. Taken from Neat URL and CleanURLs plus some others manually found.

@see {github.com/Smile4ever/Neat-URL} @see {gitlab.com/anti-tracking/ClearURLs/rules/-/blob/master/data.json} @see {github.com/Smile4ever/Neat-URL/issues/235}

Public Class Methods

clean(url) click to toggle source

Clean the given URL. If the URL can't be parsed, returns the URL unmodified.

Caches in case there're duplicates.

@param [String] @return [String]

    # File lib/url_privacy.rb
 81 def clean(url)
 82   @cleaned_urls ||= {}
 83   @cleaned_urls[url] ||= begin
 84     uri = URI(url)
 85 
 86     if uri.query
 87       hostname = uri.hostname.sub(/\Awww\./, '')
 88       params = URI.decode_www_form(uri.query).to_h
 89 
 90       # Remove params by name first
 91       params.reject! do |param, _|
 92         TRACKING_PARAMS.include? param
 93       end
 94 
 95       # Remove params with globs
 96       params.reject! do |param, _|
 97         simple_tracking_params.any? do |pattern_param|
 98           File.fnmatch(pattern_param, param)
 99         end
100       end
101 
102       # Remove params matching by hostname and then param
103       params.reject! do |param, _|
104         complex_tracking_params.any? do |pattern_hostname, pattern_params|
105           next false unless File.fnmatch(pattern_hostname, hostname)
106 
107           pattern_params.any? do |pattern_param|
108             File.fnmatch(pattern_param, param)
109           end
110         end
111       end
112 
113       uri.query = URI.encode_www_form(params)
114     end
115 
116     uri.to_s
117   end
118 rescue URI::Error
119   @cleaned_urls[url] ||= url
120 end

Private Class Methods

complex_tracking_params() click to toggle source

This is all so we can just copy and paste from Neat URL source code, it produces a hash of hostname => [ params ] that can be glob-matched.

@return [Hash]

    # File lib/url_privacy.rb
129 def complex_tracking_params
130   @complex_tracking_params ||= TRACKING_PARAMS.map do |param|
131     next unless param.include? '@'
132 
133     Hash[*param.split('@', 2).reverse]
134   end.compact.reduce({}) do |hash, pairs|
135     pairs.each do |key, value|
136       (hash[key] ||= []) << value
137     end
138 
139     hash
140   end
141 end
simple_tracking_params() click to toggle source
    # File lib/url_privacy.rb
143 def simple_tracking_params
144   @simple_tracking_params ||= TRACKING_PARAMS.select do |param|
145     !param.include?('@')
146   end
147 end