Hi, On Thu, 15 Jan 2009, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes: > > I didn't see the semantics of color-words documented in the original > either, Yeah, my bad. Will try to fix it with this round of patches. Actually, I'll give a quick outline right here: Idea: the idea of word diff is to show the differences on a word level instead of line level. To make it easier for humans (albeit we studiously exclude color blinds with our defaults), we do not show "+" and "-" as the standard diff does, but use colors to designate if the words were removed or added. Now, the thing is that the inter-word parts _can_ differ. The idea here is to show the part of the postimage and drop the preimage under the table. Method: We use libxdiff as the real workhorse. First, we let it generate a line diff. Then we reconstruct the preimage and postimage for each hunk, process both into new images that have at most one word (in the new code exactly one word) per line, and feed the new preimage/postimage pair to libxdiff. >From the output of libxdiff, we reconstruct which words were actually removed and which were added. Then -- like the line based diff -- we combine the runs of common words, removed words and added words, and show them. The algorithm I implemented in the new patch series is actually much cleaner than the old one: - it feeds images to libxdiff which contain _exactly_ one word per line, decoupling the word offsets in the original image from the offsets in the processed image, - this decoupling allows for arbitrary word boundaries, even 0-character ones, - it parses the hunk headers of the libxdiff output instead of the "-", "+" and " " lines, and therefore does not have to play tricks with the newline character in the middle of a run of removed words. > What happens if a portion of background is only in the preimage? If it is in a run of words that were removed, i.e. that are only in the preimage, then it is shown in that part. Otherwise, the background of the preimage is never shown. > E.g. when these two are compared: > > bbb aaa bb aa b > ccc aaa cc > > what should happen? We would want to say "aa" was removed by showing it > in red, but on what background should it be displayed? cc <red>aa</red> > b? If we are only ever interested in the 'a's, I'd say that the output should only reflect that. In other words, what the current code does (ccc aaa<red>aa</red> cc) is okay IMHO. After all, we said we're interested in the 'a's, so we should not complain that it did not show us the removal of 'b's. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html