Re: [PATCH take 3 0/4] color-words improvements

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Thu, 15 Jan 2009 20:25:49 +0100 (CET)

Hi,

On Thu, 15 Jan 2009, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:
> 
> I didn't see the semantics of color-words documented in the original
> either,

Yeah, my bad.  Will try to fix it with this round of patches.

Actually, I'll give a quick outline right here:

Idea: the idea of word diff is to show the differences on a word level 
instead of line level.  To make it easier for humans (albeit we studiously 
exclude color blinds with our defaults), we do not show "+" and "-" as the 
standard diff does, but use colors to designate if the words were removed 
or added.

Now, the thing is that the inter-word parts _can_ differ.  The idea here 
is to show the part of the postimage and drop the preimage under the 
table.

Method: We use libxdiff as the real workhorse.  First, we let it generate 
a line diff.

Then we reconstruct the preimage and postimage for each hunk, process both 
into new images that have at most one word (in the new code exactly one 
word) per line, and feed the new preimage/postimage pair to libxdiff.

>From the output of libxdiff, we reconstruct which words were actually 
removed and which were added.  Then -- like the line based diff -- we 
combine the runs of common words, removed words and added words, and show 
them.

The algorithm I implemented in the new patch series is actually much 
cleaner than the old one:

- it feeds images to libxdiff which contain _exactly_ one word per line, 
  decoupling the word offsets in the original image from the offsets in 
  the processed image,

- this decoupling allows for arbitrary word boundaries, even 0-character 
  ones,

- it parses the hunk headers of the libxdiff output instead of the "-", 
  "+" and " " lines, and therefore does not have to play tricks with the 
  newline character in the middle of a run of removed words.

> What happens if a portion of background is only in the preimage?

If it is in a run of words that were removed, i.e. that are only in the 
preimage, then it is shown in that part.  Otherwise, the background of the 
preimage is never shown.

> E.g. when these two are compared:
> 
>   bbb aaa bb aa b
>   ccc aaa cc
> 
> what should happen?  We would want to say "aa" was removed by showing it
> in red, but on what background should it be displayed?  cc <red>aa</red>
> b?

If we are only ever interested in the 'a's, I'd say that the output should 
only reflect that.  In other words, what the current code does (ccc 
aaa<red>aa</red> cc) is okay IMHO.  After all, we said we're interested in 
the 'a's, so we should not complain that it did not show us the removal of 
'b's.

Ciao,
Dscho

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html