Name | Total Lines | Lines of Code | Total Coverage | Code Coverage |
---|---|---|---|---|
rcov/ruby/1.8/gems/diff-lcs-1.1.2/lib/diff/lcs.rb | 1105 | 553 | 46.33%
|
6.51%
|
Code reported as executed by Ruby looks like this...and this: this line is also marked as covered.Lines considered as run by rcov, but not reported by Ruby, look like this,and this: these lines were inferred by rcov (using simple heuristics).Finally, here's a line marked as not executed.
1 #! /usr/env/bin ruby |
2 #-- |
3 # Copyright 2004 Austin Ziegler <diff-lcs@halostatue.ca> |
4 # adapted from: |
5 # Algorithm::Diff (Perl) by Ned Konz <perl@bike-nomad.com> |
6 # Smalltalk by Mario I. Wolczko <mario@wolczko.com> |
7 # implements McIlroy-Hunt diff algorithm |
8 # |
9 # This program is free software. It may be redistributed and/or modified |
10 # under the terms of the GPL version 2 (or later), the Perl Artistic |
11 # licence, or the Ruby licence. |
12 # |
13 # $Id: lcs.rb,v 1.9 2004/10/17 20:31:10 austin Exp $ |
14 #++ |
15 |
16 module Diff |
17 # = Diff::LCS 1.1.2 |
18 # Computes "intelligent" differences between two sequenced Enumerables. |
19 # This is an implementation of the McIlroy-Hunt "diff" algorithm for |
20 # Enumerable objects that include Diffable. |
21 # |
22 # Based on Mario I. Wolczko's <mario@wolczko.com> Smalltalk version |
23 # (1.2, 1993) and Ned Konz's <perl@bike-nomad.com> Perl version |
24 # (Algorithm::Diff). |
25 # |
26 # == Synopsis |
27 # require 'diff/lcs' |
28 # |
29 # seq1 = %w(a b c e h j l m n p) |
30 # seq2 = %w(b c d e f j k l m r s t) |
31 # |
32 # lcs = Diff::LCS.LCS(seq1, seq2) |
33 # diffs = Diff::LCS.diff(seq1, seq2) |
34 # sdiff = Diff::LCS.sdiff(seq1, seq2) |
35 # seq = Diff::LCS.traverse_sequences(seq1, seq2, callback_obj) |
36 # bal = Diff::LCS.traverse_balanced(seq1, seq2, callback_obj) |
37 # seq2 == Diff::LCS.patch(seq1, diffs) |
38 # seq2 == Diff::LCS.patch!(seq1, diffs) |
39 # seq1 == Diff::LCS.unpatch(seq2, diffs) |
40 # seq1 == Diff::LCS.unpatch!(seq2, diffs) |
41 # seq2 == Diff::LCS.patch(seq1, sdiff) |
42 # seq2 == Diff::LCS.patch!(seq1, sdiff) |
43 # seq1 == Diff::LCS.unpatch(seq2, sdiff) |
44 # seq1 == Diff::LCS.unpatch!(seq2, sdiff) |
45 # |
46 # Alternatively, objects can be extended with Diff::LCS: |
47 # |
48 # seq1.extend(Diff::LCS) |
49 # lcs = seq1.lcs(seq2) |
50 # diffs = seq1.diff(seq2) |
51 # sdiff = seq1.sdiff(seq2) |
52 # seq = seq1.traverse_sequences(seq2, callback_obj) |
53 # bal = seq1.traverse_balanced(seq2, callback_obj) |
54 # seq2 == seq1.patch(diffs) |
55 # seq2 == seq1.patch!(diffs) |
56 # seq1 == seq2.unpatch(diffs) |
57 # seq1 == seq2.unpatch!(diffs) |
58 # seq2 == seq1.patch(sdiff) |
59 # seq2 == seq1.patch!(sdiff) |
60 # seq1 == seq2.unpatch(sdiff) |
61 # seq1 == seq2.unpatch!(sdiff) |
62 # |
63 # Default extensions are provided for Array and String objects through |
64 # the use of 'diff/lcs/array' and 'diff/lcs/string'. |
65 # |
66 # == Introduction (by Mark-Jason Dominus) |
67 # |
68 # <em>The following text is from the Perl documentation. The only |
69 # changes have been to make the text appear better in Rdoc</em>. |
70 # |
71 # I once read an article written by the authors of +diff+; they said |
72 # that they hard worked very hard on the algorithm until they found the |
73 # right one. |
74 # |
75 # I think what they ended up using (and I hope someone will correct me, |
76 # because I am not very confident about this) was the `longest common |
77 # subsequence' method. In the LCS problem, you have two sequences of |
78 # items: |
79 # |
80 # a b c d f g h j q z |
81 # a b c d e f g i j k r x y z |
82 # |
83 # and you want to find the longest sequence of items that is present in |
84 # both original sequences in the same order. That is, you want to find a |
85 # new sequence *S* which can be obtained from the first sequence by |
86 # deleting some items, and from the second sequence by deleting other |
87 # items. You also want *S* to be as long as possible. In this case *S* |
88 # is: |
89 # |
90 # a b c d f g j z |
91 # |
92 # From there it's only a small step to get diff-like output: |
93 # |
94 # e h i k q r x y |
95 # + - + + - + + + |
96 # |
97 # This module solves the LCS problem. It also includes a canned function |
98 # to generate +diff+-like output. |
99 # |
100 # It might seem from the example above that the LCS of two sequences is |
101 # always pretty obvious, but that's not always the case, especially when |
102 # the two sequences have many repeated elements. For example, consider |
103 # |
104 # a x b y c z p d q |
105 # a b c a x b y c z |
106 # |
107 # A naive approach might start by matching up the +a+ and +b+ that |
108 # appear at the beginning of each sequence, like this: |
109 # |
110 # a x b y c z p d q |
111 # a b c a b y c z |
112 # |
113 # This finds the common subsequence +a b c z+. But actually, the LCS is |
114 # +a x b y c z+: |
115 # |
116 # a x b y c z p d q |
117 # a b c a x b y c z |
118 # |
119 # == Author |
120 # This version is by Austin Ziegler <diff-lcs@halostatue.ca>. |
121 # |
122 # It is based on the Perl Algorithm::Diff by Ned Konz |
123 # <perl@bike-nomad.com>, copyright © 2000 - 2002 and the Smalltalk |
124 # diff version by Mario I. Wolczko <mario@wolczko.com>, copyright © |
125 # 1993. Documentation includes work by Mark-Jason Dominus. |
126 # |
127 # == Licence |
128 # Copyright © 2004 Austin Ziegler |
129 # This program is free software; you can redistribute it and/or modify it |
130 # under the same terms as Ruby, or alternatively under the Perl Artistic |
131 # licence. |
132 # |
133 # == Credits |
134 # Much of the documentation is taken directly from the Perl |
135 # Algorithm::Diff implementation and was written originally by Mark-Jason |
136 # Dominus <mjd-perl-diff@plover.com> and later by Ned Konz. The basic Ruby |
137 # implementation was re-ported from the Smalltalk implementation, available |
138 # at ftp://st.cs.uiuc.edu/pub/Smalltalk/MANCHESTER/manchester/4.0/diff.st |
139 # |
140 # #sdiff and #traverse_balanced were written for the Perl version by Mike |
141 # Schilli <m@perlmeister.com>. |
142 # |
143 # "The algorithm is described in <em>A Fast Algorithm for Computing Longest |
144 # Common Subsequences</em>, CACM, vol.20, no.5, pp.350-353, May 1977, with |
145 # a few minor improvements to improve the speed." |
146 module LCS |
147 VERSION = '1.1.2' |
148 end |
149 end |
150 |
151 require 'diff/lcs/callbacks' |
152 |
153 module Diff::LCS |
154 # Returns an Array containing the longest common subsequence(s) between |
155 # +self+ and +other+. See Diff::LCS#LCS. |
156 # |
157 # lcs = seq1.lcs(seq2) |
158 def lcs(other, &block) #:yields self[ii] if there are matched subsequences: |
159 Diff::LCS.LCS(self, other, &block) |
160 end |
161 |
162 # Returns the difference set between +self+ and +other+. See |
163 # Diff::LCS#diff. |
164 def diff(other, callbacks = nil, &block) |
165 Diff::LCS::diff(self, other, callbacks, &block) |
166 end |
167 |
168 # Returns the balanced ("side-by-side") difference set between +self+ and |
169 # +other+. See Diff::LCS#sdiff. |
170 def sdiff(other, callbacks = nil, &block) |
171 Diff::LCS::sdiff(self, other, callbacks, &block) |
172 end |
173 |
174 # Traverses the discovered longest common subsequences between +self+ and |
175 # +other+. See Diff::LCS#traverse_sequences. |
176 def traverse_sequences(other, callbacks = nil, &block) |
177 traverse_sequences(self, other, callbacks || Diff::LCS::YieldingCallbacks, |
178 &block) |
179 end |
180 |
181 # Traverses the discovered longest common subsequences between +self+ and |
182 # +other+ using the alternate, balanced algorithm. See |
183 # Diff::LCS#traverse_balanced. |
184 def traverse_balanced(other, callbacks = nil, &block) |
185 traverse_balanced(self, other, callbacks || Diff::LCS::YieldingCallbacks, |
186 &block) |
187 end |
188 |
189 # Attempts to patch a copy of +self+ with the provided +patchset+. See |
190 # Diff::LCS#patch. |
191 def patch(patchset) |
192 Diff::LCS::patch(self.dup, patchset) |
193 end |
194 |
195 # Attempts to unpatch a copy of +self+ with the provided +patchset+. |
196 # See Diff::LCS#patch. |
197 def unpatch(patchset) |
198 Diff::LCS::unpatch(self.dup, patchset) |
199 end |
200 |
201 # Attempts to patch +self+ with the provided +patchset+. See |
202 # Diff::LCS#patch!. Does no autodiscovery. |
203 def patch!(patchset) |
204 Diff::LCS::patch!(self, patchset) |
205 end |
206 |
207 # Attempts to unpatch +self+ with the provided +patchset+. See |
208 # Diff::LCS#unpatch. Does no autodiscovery. |
209 def unpatch!(patchset) |
210 Diff::LCS::unpatch!(self, patchset) |
211 end |
212 end |
213 |
214 module Diff::LCS |
215 class << self |
216 # Given two sequenced Enumerables, LCS returns an Array containing their |
217 # longest common subsequences. |
218 # |
219 # lcs = Diff::LCS.LCS(seq1, seq2) |
220 # |
221 # This array whose contents is such that: |
222 # |
223 # lcs.each_with_index do |ee, ii| |
224 # assert(ee.nil? || (seq1[ii] == seq2[ee])) |
225 # end |
226 # |
227 # If a block is provided, the matching subsequences will be yielded from |
228 # +seq1+ in turn and may be modified before they are placed into the |
229 # returned Array of subsequences. |
230 def LCS(seq1, seq2, &block) #:yields seq1[ii] for each matched: |
231 matches = Diff::LCS.__lcs(seq1, seq2) |
232 ret = [] |
233 matches.each_with_index do |ee, ii| |
234 unless matches[ii].nil? |
235 if block_given? |
236 ret << (yield seq1[ii]) |
237 else |
238 ret << seq1[ii] |
239 end |
240 end |
241 end |
242 ret |
243 end |
244 |
245 # Diff::LCS.diff computes the smallest set of additions and deletions |
246 # necessary to turn the first sequence into the second, and returns a |
247 # description of these changes. |
248 # |
249 # See Diff::LCS::DiffCallbacks for the default behaviour. An alternate |
250 # behaviour may be implemented with Diff::LCS::ContextDiffCallbacks. |
251 # If a Class argument is provided for +callbacks+, #diff will attempt |
252 # to initialise it. If the +callbacks+ object (possibly initialised) |
253 # responds to #finish, it will be called. |
254 def diff(seq1, seq2, callbacks = nil, &block) # :yields diff changes: |
255 callbacks ||= Diff::LCS::DiffCallbacks |
256 if callbacks.kind_of?(Class) |
257 cb = callbacks.new rescue callbacks |
258 callbacks = cb |
259 end |
260 traverse_sequences(seq1, seq2, callbacks) |
261 callbacks.finish if callbacks.respond_to?(:finish) |
262 |
263 if block_given? |
264 res = callbacks.diffs.map do |hunk| |
265 if hunk.kind_of?(Array) |
266 hunk = hunk.map { |block| yield block } |
267 else |
268 yield hunk |
269 end |
270 end |
271 res |
272 else |
273 callbacks.diffs |
274 end |
275 end |
276 |
277 # Diff::LCS.sdiff computes all necessary components to show two sequences |
278 # and their minimized differences side by side, just like the Unix |
279 # utility <em>sdiff</em> does: |
280 # |
281 # old < - |
282 # same same |
283 # before | after |
284 # - > new |
285 # |
286 # See Diff::LCS::SDiffCallbacks for the default behaviour. An alternate |
287 # behaviour may be implemented with Diff::LCS::ContextDiffCallbacks. If |
288 # a Class argument is provided for +callbacks+, #diff will attempt to |
289 # initialise it. If the +callbacks+ object (possibly initialised) |
290 # responds to #finish, it will be called. |
291 def sdiff(seq1, seq2, callbacks = nil, &block) #:yields diff changes: |
292 callbacks ||= Diff::LCS::SDiffCallbacks |
293 if callbacks.kind_of?(Class) |
294 cb = callbacks.new rescue callbacks |
295 callbacks = cb |
296 end |
297 traverse_balanced(seq1, seq2, callbacks) |
298 callbacks.finish if callbacks.respond_to?(:finish) |
299 |
300 if block_given? |
301 res = callbacks.diffs.map do |hunk| |
302 if hunk.kind_of?(Array) |
303 hunk = hunk.map { |block| yield block } |
304 else |
305 yield hunk |
306 end |
307 end |
308 res |
309 else |
310 callbacks.diffs |
311 end |
312 end |
313 |
314 # Diff::LCS.traverse_sequences is the most general facility provided by this |
315 # module; +diff+ and +LCS+ are implemented as calls to it. |
316 # |
317 # The arguments to #traverse_sequences are the two sequences to |
318 # traverse, and a callback object, like this: |
319 # |
320 # traverse_sequences(seq1, seq2, Diff::LCS::ContextDiffCallbacks.new) |
321 # |
322 # #diff is implemented with #traverse_sequences. |
323 # |
324 # == Callback Methods |
325 # Optional callback methods are <em>emphasized</em>. |
326 # |
327 # callbacks#match:: Called when +a+ and +b+ are pointing |
328 # to common elements in +A+ and +B+. |
329 # callbacks#discard_a:: Called when +a+ is pointing to an |
330 # element not in +B+. |
331 # callbacks#discard_b:: Called when +b+ is pointing to an |
332 # element not in +A+. |
333 # <em>callbacks#finished_a</em>:: Called when +a+ has reached the end of |
334 # sequence +A+. |
335 # <em>callbacks#finished_b</em>:: Called when +b+ has reached the end of |
336 # sequence +B+. |
337 # |
338 # == Algorithm |
339 # a---+ |
340 # v |
341 # A = a b c e h j l m n p |
342 # B = b c d e f j k l m r s t |
343 # ^ |
344 # b---+ |
345 # |
346 # If there are two arrows (+a+ and +b+) pointing to elements of |
347 # sequences +A+ and +B+, the arrows will initially point to the first |
348 # elements of their respective sequences. #traverse_sequences will |
349 # advance the arrows through the sequences one element at a time, |
350 # calling a method on the user-specified callback object before each |
351 # advance. It will advance the arrows in such a way that if there are |
352 # elements <tt>A[ii]</tt> and <tt>B[jj]</tt> which are both equal and |
353 # part of the longest common subsequence, there will be some moment |
354 # during the execution of #traverse_sequences when arrow +a+ is pointing |
355 # to <tt>A[ii]</tt> and arrow +b+ is pointing to <tt>B[jj]</tt>. When |
356 # this happens, #traverse_sequences will call <tt>callbacks#match</tt> |
357 # and then it will advance both arrows. |
358 # |
359 # Otherwise, one of the arrows is pointing to an element of its sequence |
360 # that is not part of the longest common subsequence. |
361 # #traverse_sequences will advance that arrow and will call |
362 # <tt>callbacks#discard_a</tt> or <tt>callbacks#discard_b</tt>, depending |
363 # on which arrow it advanced. If both arrows point to elements that are |
364 # not part of the longest common subsequence, then #traverse_sequences |
365 # will advance one of them and call the appropriate callback, but it is |
366 # not specified which it will call. |
367 # |
368 # The methods for <tt>callbacks#match</tt>, <tt>callbacks#discard_a</tt>, |
369 # and <tt>callbacks#discard_b</tt> are invoked with an event comprising |
370 # the action ("=", "+", or "-", respectively), the indicies +ii+ and |
371 # +jj+, and the elements <tt>A[ii]</tt> and <tt>B[jj]</tt>. Return |
372 # values are discarded by #traverse_sequences. |
373 # |
374 # === End of Sequences |
375 # If arrow +a+ reaches the end of its sequence before arrow +b+ does, |
376 # #traverse_sequence try to call <tt>callbacks#finished_a</tt> with the |
377 # last index and element of +A+ (<tt>A[-1]</tt>) and the current index |
378 # and element of +B+ (<tt>B[jj]</tt>). If <tt>callbacks#finished_a</tt> |
379 # does not exist, then <tt>callbacks#discard_b</tt> will be called on |
380 # each element of +B+ until the end of the sequence is reached (the call |
381 # will be done with <tt>A[-1]</tt> and <tt>B[jj]</tt> for each element). |
382 # |
383 # If +b+ reaches the end of +B+ before +a+ reaches the end of +A+, |
384 # <tt>callbacks#finished_b</tt> will be called with the current index |
385 # and element of +A+ (<tt>A[ii]</tt>) and the last index and element of |
386 # +B+ (<tt>A[-1]</tt>). Again, if <tt>callbacks#finished_b</tt> does not |
387 # exist on the callback object, then <tt>callbacks#discard_a</tt> will |
388 # be called on each element of +A+ until the end of the sequence is |
389 # reached (<tt>A[ii]</tt> and <tt>B[-1]</tt>). |
390 # |
391 # There is a chance that one additional <tt>callbacks#discard_a</tt> or |
392 # <tt>callbacks#discard_b</tt> will be called after the end of the |
393 # sequence is reached, if +a+ has not yet reached the end of +A+ or +b+ |
394 # has not yet reached the end of +B+. |
395 def traverse_sequences(seq1, seq2, callbacks = Diff::LCS::SequenceCallbacks, &block) #:yields change events: |
396 matches = Diff::LCS.__lcs(seq1, seq2) |
397 |
398 run_finished_a = run_finished_b = false |
399 string = seq1.kind_of?(String) |
400 |
401 a_size = seq1.size |
402 b_size = seq2.size |
403 ai = bj = 0 |
404 |
405 (0 .. matches.size).each do |ii| |
406 b_line = matches[ii] |
407 |
408 ax = string ? seq1[ii, 1] : seq1[ii] |
409 bx = string ? seq2[bj, 1] : seq2[bj] |
410 |
411 if b_line.nil? |
412 unless ax.nil? |
413 event = Diff::LCS::ContextChange.new('-', ii, ax, bj, bx) |
414 event = yield event if block_given? |
415 callbacks.discard_a(event) |
416 end |
417 else |
418 loop do |
419 break unless bj < b_line |
420 bx = string ? seq2[bj, 1] : seq2[bj] |
421 event = Diff::LCS::ContextChange.new('+', ii, ax, bj, bx) |
422 event = yield event if block_given? |
423 callbacks.discard_b(event) |
424 bj += 1 |
425 end |
426 bx = string ? seq2[bj, 1] : seq2[bj] |
427 event = Diff::LCS::ContextChange.new('=', ii, ax, bj, bx) |
428 event = yield event if block_given? |
429 callbacks.match(event) |
430 bj += 1 |
431 end |
432 ai = ii |
433 end |
434 ai += 1 |
435 |
436 # The last entry (if any) processed was a match. +ai+ and +bj+ point |
437 # just past the last matching lines in their sequences. |
438 while (ai < a_size) or (bj < b_size) |
439 # last A? |
440 if ai == a_size and bj < b_size |
441 if callbacks.respond_to?(:finished_a) and not run_finished_a |
442 ax = string ? seq1[-1, 1] : seq1[-1] |
443 bx = string ? seq2[bj, 1] : seq2[bj] |
444 event = Diff::LCS::ContextChange.new('>', (a_size - 1), ax, bj, bx) |
445 event = yield event if block_given? |
446 callbacks.finished_a(event) |
447 run_finished_a = true |
448 else |
449 ax = string ? seq1[ai, 1] : seq1[ai] |
450 loop do |
451 bx = string ? seq2[bj, 1] : seq2[bj] |
452 event = Diff::LCS::ContextChange.new('+', ai, ax, bj, bx) |
453 event = yield event if block_given? |
454 callbacks.discard_b(event) |
455 bj += 1 |
456 break unless bj < b_size |
457 end |
458 end |
459 end |
460 |
461 # last B? |
462 if bj == b_size and ai < a_size |
463 if callbacks.respond_to?(:finished_b) and not run_finished_b |
464 ax = string ? seq1[ai, 1] : seq1[ai] |
465 bx = string ? seq2[-1, 1] : seq2[-1] |
466 event = Diff::LCS::ContextChange.new('<', ai, ax, (b_size - 1), bx) |
467 event = yield event if block_given? |
468 callbacks.finished_b(event) |
469 run_finished_b = true |
470 else |
471 bx = string ? seq2[bj, 1] : seq2[bj] |
472 loop do |
473 ax = string ? seq1[ai, 1] : seq1[ai] |
474 event = Diff::LCS::ContextChange.new('-', ai, ax, bj, bx) |
475 event = yield event if block_given? |
476 callbacks.discard_a(event) |
477 ai += 1 |
478 break unless bj < b_size |
479 end |
480 end |
481 end |
482 |
483 if ai < a_size |
484 ax = string ? seq1[ai, 1] : seq1[ai] |
485 bx = string ? seq2[bj, 1] : seq2[bj] |
486 event = Diff::LCS::ContextChange.new('-', ai, ax, bj, bx) |
487 event = yield event if block_given? |
488 callbacks.discard_a(event) |
489 ai += 1 |
490 end |
491 |
492 if bj < b_size |
493 ax = string ? seq1[ai, 1] : seq1[ai] |
494 bx = string ? seq2[bj, 1] : seq2[bj] |
495 event = Diff::LCS::ContextChange.new('+', ai, ax, bj, bx) |
496 event = yield event if block_given? |
497 callbacks.discard_b(event) |
498 bj += 1 |
499 end |
500 end |
501 end |
502 |
503 # #traverse_balanced is an alternative to #traverse_sequences. It |
504 # uses a different algorithm to iterate through the entries in the |
505 # computed longest common subsequence. Instead of viewing the changes as |
506 # insertions or deletions from one of the sequences, #traverse_balanced |
507 # will report <em>changes</em> between the sequences. To represent a |
508 # |
509 # The arguments to #traverse_balanced are the two sequences to traverse |
510 # and a callback object, like this: |
511 # |
512 # traverse_balanced(seq1, seq2, Diff::LCS::ContextDiffCallbacks.new) |
513 # |
514 # #sdiff is implemented with #traverse_balanced. |
515 # |
516 # == Callback Methods |
517 # Optional callback methods are <em>emphasized</em>. |
518 # |
519 # callbacks#match:: Called when +a+ and +b+ are pointing |
520 # to common elements in +A+ and +B+. |
521 # callbacks#discard_a:: Called when +a+ is pointing to an |
522 # element not in +B+. |
523 # callbacks#discard_b:: Called when +b+ is pointing to an |
524 # element not in +A+. |
525 # <em>callbacks#change</em>:: Called when +a+ and +b+ are pointing |
526 # to the same relative position, but |
527 # <tt>A[a]</tt> and <tt>B[b]</tt> are |
528 # not the same; a <em>change</em> has |
529 # occurred. |
530 # |
531 # #traverse_balanced might be a bit slower than #traverse_sequences, |
532 # noticable only while processing huge amounts of data. |
533 # |
534 # The +sdiff+ function of this module is implemented as call to |
535 # #traverse_balanced. |
536 # |
537 # == Algorithm |
538 # a---+ |
539 # v |
540 # A = a b c e h j l m n p |
541 # B = b c d e f j k l m r s t |
542 # ^ |
543 # b---+ |
544 # |
545 # === Matches |
546 # If there are two arrows (+a+ and +b+) pointing to elements of |
547 # sequences +A+ and +B+, the arrows will initially point to the first |
548 # elements of their respective sequences. #traverse_sequences will |
549 # advance the arrows through the sequences one element at a time, |
550 # calling a method on the user-specified callback object before each |
551 # advance. It will advance the arrows in such a way that if there are |
552 # elements <tt>A[ii]</tt> and <tt>B[jj]</tt> which are both equal and |
553 # part of the longest common subsequence, there will be some moment |
554 # during the execution of #traverse_sequences when arrow +a+ is pointing |
555 # to <tt>A[ii]</tt> and arrow +b+ is pointing to <tt>B[jj]</tt>. When |
556 # this happens, #traverse_sequences will call <tt>callbacks#match</tt> |
557 # and then it will advance both arrows. |
558 # |
559 # === Discards |
560 # Otherwise, one of the arrows is pointing to an element of its sequence |
561 # that is not part of the longest common subsequence. |
562 # #traverse_sequences will advance that arrow and will call |
563 # <tt>callbacks#discard_a</tt> or <tt>callbacks#discard_b</tt>, |
564 # depending on which arrow it advanced. |
565 # |
566 # === Changes |
567 # If both +a+ and +b+ point to elements that are not part of the longest |
568 # common subsequence, then #traverse_sequences will try to call |
569 # <tt>callbacks#change</tt> and advance both arrows. If |
570 # <tt>callbacks#change</tt> is not implemented, then |
571 # <tt>callbacks#discard_a</tt> and <tt>callbacks#discard_b</tt> will be |
572 # called in turn. |
573 # |
574 # The methods for <tt>callbacks#match</tt>, <tt>callbacks#discard_a</tt>, |
575 # <tt>callbacks#discard_b</tt>, and <tt>callbacks#change</tt> are |
576 # invoked with an event comprising the action ("=", "+", "-", or "!", |
577 # respectively), the indicies +ii+ and +jj+, and the elements |
578 # <tt>A[ii]</tt> and <tt>B[jj]</tt>. Return values are discarded by |
579 # #traverse_balanced. |
580 # |
581 # === Context |
582 # Note that +ii+ and +jj+ may not be the same index position, even if |
583 # +a+ and +b+ are considered to be pointing to matching or changed |
584 # elements. |
585 def traverse_balanced(seq1, seq2, callbacks = Diff::LCS::BalancedCallbacks) |
586 matches = Diff::LCS.__lcs(seq1, seq2) |
587 a_size = seq1.size |
588 b_size = seq2.size |
589 ai = bj = mb = 0 |
590 ma = -1 |
591 string = seq1.kind_of?(String) |
592 |
593 # Process all the lines in the match vector. |
594 loop do |
595 # Find next match indices +ma+ and +mb+ |
596 loop do |
597 ma += 1 |
598 break unless ma < matches.size and matches[ma].nil? |
599 end |
600 |
601 break if ma >= matches.size # end of matches? |
602 mb = matches[ma] |
603 |
604 # Change(seq2) |
605 while (ai < ma) or (bj < mb) |
606 ax = string ? seq1[ai, 1] : seq1[ai] |
607 bx = string ? seq2[bj, 1] : seq2[bj] |
608 |
609 case [(ai < ma), (bj < mb)] |
610 when [true, true] |
611 if callbacks.respond_to?(:change) |
612 event = Diff::LCS::ContextChange.new('!', ai, ax, bj, bx) |
613 event = yield event if block_given? |
614 callbacks.change(event) |
615 ai += 1 |
616 bj += 1 |
617 else |
618 event = Diff::LCS::ContextChange.new('-', ai, ax, bj, bx) |
619 event = yield event if block_given? |
620 callbacks.discard_a(event) |
621 ai += 1 |
622 ax = string ? seq1[ai, 1] : seq1[ai] |
623 event = Diff::LCS::ContextChange.new('+', ai, ax, bj, bx) |
624 event = yield event if block_given? |
625 callbacks.discard_b(event) |
626 bj += 1 |
627 end |
628 when [true, false] |
629 event = Diff::LCS::ContextChange.new('-', ai, ax, bj, bx) |
630 event = yield event if block_given? |
631 callbacks.discard_a(event) |
632 ai += 1 |
633 when [false, true] |
634 event = Diff::LCS::ContextChange.new('+', ai, ax, bj, bx) |
635 event = yield event if block_given? |
636 callbacks.discard_b(event) |
637 bj += 1 |
638 end |
639 end |
640 |
641 # Match |
642 ax = string ? seq1[ai, 1] : seq1[ai] |
643 bx = string ? seq2[bj, 1] : seq2[bj] |
644 event = Diff::LCS::ContextChange.new('=', ai, ax, bj, bx) |
645 event = yield event if block_given? |
646 callbacks.match(event) |
647 ai += 1 |
648 bj += 1 |
649 end |
650 |
651 while (ai < a_size) or (bj < b_size) |
652 ax = string ? seq1[ai, 1] : seq1[ai] |
653 bx = string ? seq2[bj, 1] : seq2[bj] |
654 |
655 case [(ai < a_size), (bj < b_size)] |
656 when [true, true] |
657 if callbacks.respond_to?(:change) |
658 event = Diff::LCS::ContextChange.new('!', ai, ax, bj, bx) |
659 event = yield event if block_given? |
660 callbacks.change(event) |
661 ai += 1 |
662 bj += 1 |
663 else |
664 event = Diff::LCS::ContextChange.new('-', ai, ax, bj, bx) |
665 event = yield event if block_given? |
666 callbacks.discard_a(event) |
667 ai += 1 |
668 ax = string ? seq1[ai, 1] : seq1[ai] |
669 event = Diff::LCS::ContextChange.new('+', ai, ax, bj, bx) |
670 event = yield event if block_given? |
671 callbacks.discard_b(event) |
672 bj += 1 |
673 end |
674 when [true, false] |
675 event = Diff::LCS::ContextChange.new('-', ai, ax, bj, bx) |
676 event = yield event if block_given? |
677 callbacks.discard_a(event) |
678 ai += 1 |
679 when [false, true] |
680 event = Diff::LCS::ContextChange.new('+', ai, ax, bj, bx) |
681 event = yield event if block_given? |
682 callbacks.discard_b(event) |
683 bj += 1 |
684 end |
685 end |
686 end |
687 |
688 PATCH_MAP = { #:nodoc: |
689 :patch => { '+' => '+', '-' => '-', '!' => '!', '=' => '=' }, |
690 :unpatch => { '+' => '-', '-' => '+', '!' => '!', '=' => '=' } |
691 } |
692 |
693 # Given a patchset, convert the current version to the new |
694 # version. If +direction+ is not specified (must be |
695 # <tt>:patch</tt> or <tt>:unpatch</tt>), then discovery of the |
696 # direction of the patch will be attempted. |
697 def patch(src, patchset, direction = nil) |
698 string = src.kind_of?(String) |
699 # Start with a new empty type of the source's class |
700 res = src.class.new |
701 |
702 # Normalize the patchset. |
703 patchset = __normalize_patchset(patchset) |
704 |
705 direction ||= Diff::LCS.__diff_direction(src, patchset) |
706 direction ||= :patch |
707 |
708 ai = bj = 0 |
709 |
710 patchset.each do |change| |
711 # Both Change and ContextChange support #action |
712 action = PATCH_MAP[direction][change.action] |
713 |
714 case change |
715 when Diff::LCS::ContextChange |
716 case direction |
717 when :patch |
718 el = change.new_element |
719 op = change.old_position |
720 np = change.new_position |
721 when :unpatch |
722 el = change.old_element |
723 op = change.new_position |
724 np = change.old_position |
725 end |
726 |
727 case action |
728 when '-' # Remove details from the old string |
729 while ai < op |
730 res << (string ? src[ai, 1] : src[ai]) |
731 ai += 1 |
732 bj += 1 |
733 end |
734 ai += 1 |
735 when '+' |
736 while bj < np |
737 res << (string ? src[ai, 1] : src[ai]) |
738 ai += 1 |
739 bj += 1 |
740 end |
741 |
742 res << el |
743 bj += 1 |
744 when '=' |
745 # This only appears in sdiff output with the SDiff callback. |
746 # Therefore, we only need to worry about dealing with a single |
747 # element. |
748 res << el |
749 |
750 ai += 1 |
751 bj += 1 |
752 when '!' |
753 while ai < op |
754 res << (string ? src[ai, 1] : src[ai]) |
755 ai += 1 |
756 bj += 1 |
757 end |
758 |
759 bj += 1 |
760 ai += 1 |
761 |
762 res << el |
763 end |
764 when Diff::LCS::Change |
765 case action |
766 when '-' |
767 while ai < change.position |
768 res << (string ? src[ai, 1] : src[ai]) |
769 ai += 1 |
770 bj += 1 |
771 end |
772 ai += 1 |
773 when '+' |
774 while bj < change.position |
775 res << (string ? src[ai, 1] : src[ai]) |
776 ai += 1 |
777 bj += 1 |
778 end |
779 |
780 bj += 1 |
781 |
782 res << change.element |
783 end |
784 end |
785 end |
786 |
787 while ai < src.size |
788 res << (string ? src[ai, 1] : src[ai]) |
789 ai += 1 |
790 bj += 1 |
791 end |
792 |
793 res |
794 end |
795 |
796 # Given a set of patchset, convert the current version to the prior |
797 # version. Does no auto-discovery. |
798 def unpatch!(src, patchset) |
799 Diff::LCS.patch(src, patchset, :unpatch) |
800 end |
801 |
802 # Given a set of patchset, convert the current version to the next |
803 # version. Does no auto-discovery. |
804 def patch!(src, patchset) |
805 Diff::LCS.patch(src, patchset, :patch) |
806 end |
807 |
808 # private |
809 # Compute the longest common subsequence between the sequenced Enumerables |
810 # +a+ and +b+. The result is an array whose contents is such that |
811 # |
812 # result = Diff::LCS.__lcs(a, b) |
813 # result.each_with_index do |e, ii| |
814 # assert_equal(a[ii], b[e]) unless e.nil? |
815 # end |
816 def __lcs(a, b) |
817 a_start = b_start = 0 |
818 a_finish = a.size - 1 |
819 b_finish = b.size - 1 |
820 vector = [] |
821 |
822 # Prune off any common elements at the beginning... |
823 while (a_start <= a_finish) and |
824 (b_start <= b_finish) and |
825 (a[a_start] == b[b_start]) |
826 vector[a_start] = b_start |
827 a_start += 1 |
828 b_start += 1 |
829 end |
830 |
831 # Now the end... |
832 while (a_start <= a_finish) and |
833 (b_start <= b_finish) and |
834 (a[a_finish] == b[b_finish]) |
835 vector[a_finish] = b_finish |
836 a_finish -= 1 |
837 b_finish -= 1 |
838 end |
839 |
840 # Now, compute the equivalence classes of positions of elements. |
841 b_matches = Diff::LCS.__position_hash(b, b_start .. b_finish) |
842 |
843 thresh = [] |
844 links = [] |
845 |
846 (a_start .. a_finish).each do |ii| |
847 ai = a.kind_of?(String) ? a[ii, 1] : a[ii] |
848 bm = b_matches[ai] |
849 kk = nil |
850 bm.reverse_each do |jj| |
851 if kk and (thresh[kk] > jj) and (thresh[kk - 1] < jj) |
852 thresh[kk] = jj |
853 else |
854 kk = Diff::LCS.__replace_next_larger(thresh, jj, kk) |
855 end |
856 links[kk] = [ (kk > 0) ? links[kk - 1] : nil, ii, jj ] unless kk.nil? |
857 end |
858 end |
859 |
860 unless thresh.empty? |
861 link = links[thresh.size - 1] |
862 while not link.nil? |
863 vector[link[1]] = link[2] |
864 link = link[0] |
865 end |
866 end |
867 |
868 vector |
869 end |
870 |
871 # Find the place at which +value+ would normally be inserted into the |
872 # Enumerable. If that place is already occupied by +value+, do nothing |
873 # and return +nil+. If the place does not exist (i.e., it is off the end |
874 # of the Enumerable), add it to the end. Otherwise, replace the element |
875 # at that point with +value+. It is assumed that the Enumerable's values |
876 # are numeric. |
877 # |
878 # This operation preserves the sort order. |
879 def __replace_next_larger(enum, value, last_index = nil) |
880 # Off the end? |
881 if enum.empty? or (value > enum[-1]) |
882 enum << value |
883 return enum.size - 1 |
884 end |
885 |
886 # Binary search for the insertion point |
887 last_index ||= enum.size |
888 first_index = 0 |
889 while (first_index <= last_index) |
890 ii = (first_index + last_index) >> 1 |
891 |
892 found = enum[ii] |
893 |
894 if value == found |
895 return nil |
896 elsif value > found |
897 first_index = ii + 1 |
898 else |
899 last_index = ii - 1 |
900 end |
901 end |
902 |
903 # The insertion point is in first_index; overwrite the next larger |
904 # value. |
905 enum[first_index] = value |
906 return first_index |
907 end |
908 |
909 # If +vector+ maps the matching elements of another collection onto this |
910 # Enumerable, compute the inverse +vector+ that maps this Enumerable |
911 # onto the collection. (Currently unused.) |
912 def __inverse_vector(a, vector) |
913 inverse = a.dup |
914 (0 ... vector.size).each do |ii| |
915 inverse[vector[ii]] = ii unless vector[ii].nil? |
916 end |
917 inverse |
918 end |
919 |
920 # Returns a hash mapping each element of an Enumerable to the set of |
921 # positions it occupies in the Enumerable, optionally restricted to the |
922 # elements specified in the range of indexes specified by +interval+. |
923 def __position_hash(enum, interval = 0 .. -1) |
924 hash = Hash.new { |hh, kk| hh[kk] = [] } |
925 interval.each do |ii| |
926 kk = enum.kind_of?(String) ? enum[ii, 1] : enum[ii] |
927 hash[kk] << ii |
928 end |
929 hash |
930 end |
931 |
932 # Examine the patchset and the source to see in which direction the |
933 # patch should be applied. |
934 # |
935 # WARNING: By default, this examines the whole patch, so this could take |
936 # some time. This also works better with Diff::LCS::ContextChange or |
937 # Diff::LCS::Change as its source, as an array will cause the creation |
938 # of one of the above. |
939 def __diff_direction(src, patchset, limit = nil) |
940 count = left = left_miss = right = right_miss = 0 |
941 string = src.kind_of?(String) |
942 |
943 patchset.each do |change| |
944 count += 1 |
945 |
946 case change |
947 when Diff::LCS::Change |
948 # With a simplistic change, we can't tell the difference between |
949 # the left and right on '!' actions, so we ignore those. On '=' |
950 # actions, if there's a miss, we miss both left and right. |
951 element = string ? src[change.position, 1] : src[change.position] |
952 |
953 case change.action |
954 when '-' |
955 if element == change.element |
956 left += 1 |
957 else |
958 left_miss += 1 |
959 end |
960 when '+' |
961 if element == change.element |
962 right += 1 |
963 else |
964 right_miss += 1 |
965 end |
966 when '=' |
967 if element != change.element |
968 left_miss += 1 |
969 right_miss += 1 |
970 end |
971 end |
972 when Diff::LCS::ContextChange |
973 case change.action |
974 when '-' # Remove details from the old string |
975 element = string ? src[change.old_position, 1] : src[change.old_position] |
976 if element == change.old_element |
977 left += 1 |
978 else |
979 left_miss += 1 |
980 end |
981 when '+' |
982 element = string ? src[change.new_position, 1] : src[change.new_position] |
983 if element == change.new_element |
984 right += 1 |
985 else |
986 right_miss += 1 |
987 end |
988 when '=' |
989 le = string ? src[change.old_position, 1] : src[change.old_position] |
990 re = string ? src[change.new_position, 1] : src[change.new_position] |
991 |
992 left_miss += 1 if le != change.old_element |
993 right_miss += 1 if re != change.new_element |
994 when '!' |
995 element = string ? src[change.old_position, 1] : src[change.old_position] |
996 if element == change.old_element |
997 left += 1 |
998 else |
999 element = string ? src[change.new_position, 1] : src[change.new_position] |
1000 if element == change.new_element |
1001 right += 1 |
1002 else |
1003 left_miss += 1 |
1004 right_miss += 1 |
1005 end |
1006 end |
1007 end |
1008 end |
1009 |
1010 break if not limit.nil? and count > limit |
1011 end |
1012 |
1013 no_left = (left == 0) and (left_miss >= 0) |
1014 no_right = (right == 0) and (right_miss >= 0) |
1015 |
1016 case [no_left, no_right] |
1017 when [false, true] |
1018 return :patch |
1019 when [true, false] |
1020 return :unpatch |
1021 else |
1022 raise "The provided patchset does not appear to apply to the provided value as either source or destination value." |
1023 end |
1024 end |
1025 |
1026 # Normalize the patchset. A patchset is always a sequence of changes, but |
1027 # how those changes are represented may vary, depending on how they were |
1028 # generated. In all cases we support, we also support the array |
1029 # representation of the changes. The formats are: |
1030 # |
1031 # [ # patchset <- Diff::LCS.diff(a, b) |
1032 # [ # one or more hunks |
1033 # Diff::LCS::Change # one or more changes |
1034 # ] ] |
1035 # |
1036 # [ # patchset, equivalent to the above |
1037 # [ # one or more hunks |
1038 # [ action, line, value ] # one or more changes |
1039 # ] ] |
1040 # |
1041 # [ # patchset <- Diff::LCS.diff(a, b, Diff::LCS::ContextDiffCallbacks) |
1042 # # OR <- Diff::LCS.sdiff(a, b, Diff::LCS::ContextDiffCallbacks) |
1043 # [ # one or more hunks |
1044 # Diff::LCS::ContextChange # one or more changes |
1045 # ] ] |
1046 # |
1047 # [ # patchset, equivalent to the above |
1048 # [ # one or more hunks |
1049 # [ action, [ old line, old value ], [ new line, new value ] ] |
1050 # # one or more changes |
1051 # ] ] |
1052 # |
1053 # [ # patchset <- Diff::LCS.sdiff(a, b) |
1054 # # OR <- Diff::LCS.diff(a, b, Diff::LCS::SDiffCallbacks) |
1055 # Diff::LCS::ContextChange # one or more changes |
1056 # ] |
1057 # |
1058 # [ # patchset, equivalent to the above |
1059 # [ action, [ old line, old value ], [ new line, new value ] ] |
1060 # # one or more changes |
1061 # ] |
1062 # |
1063 # The result of this will be either of the following. |
1064 # |
1065 # [ # patchset |
1066 # Diff::LCS::ContextChange # one or more changes |
1067 # ] |
1068 # |
1069 # [ # patchset |
1070 # Diff::LCS::Change # one or more changes |
1071 # ] |
1072 # |
1073 # If either of the above is provided, it will be returned as such. |
1074 # |
1075 def __normalize_patchset(patchset) |
1076 patchset.map do |hunk| |
1077 case hunk |
1078 when Diff::LCS::ContextChange, Diff::LCS::Change |
1079 hunk |
1080 when Array |
1081 if (not hunk[0].kind_of?(Array)) and hunk[1].kind_of?(Array) and hunk[2].kind_of?(Array) |
1082 Diff::LCS::ContextChange.from_a(hunk) |
1083 else |
1084 hunk.map do |change| |
1085 case change |
1086 when Diff::LCS::ContextChange, Diff::LCS::Change |
1087 change |
1088 when Array |
1089 # change[1] will ONLY be an array in a ContextChange#to_a call. |
1090 # In Change#to_a, it represents the line (singular). |
1091 if change[1].kind_of?(Array) |
1092 Diff::LCS::ContextChange.from_a(change) |
1093 else |
1094 Diff::LCS::Change.from_a(change) |
1095 end |
1096 end |
1097 end |
1098 end |
1099 else |
1100 raise ArgumentError, "Cannot normalise a hunk of class #{hunk.class}." |
1101 end |
1102 end.flatten |
1103 end |
1104 end |
1105 end |
Generated on Fri Apr 22 17:22:41 -0700 2011 with rcov 0.9.8