Trouble using --word-diff results

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've been working with the output of "git diff --word-diff" and am seeing some unexpected results which at first I thought might be a bug, but now I am beginning to wonder if --word-diff is actually useful at all for the purposes of scripting.

I noticed this specifically with the --word-diff=porcelain format, which I thought would be helpful to produce output colorized diff output like that which you can see here:

  http://github.com/git/git/commit/c2e0940b44ded03f0af02be95c35b231fea633c1
  http://qt.gitorious.org/+qt-developers/qt/staging/commit/45851a64ead74748d6b5045066545ee2c95d83f6

ie. normal unified diff style, but with darker background coloring within each added or removed line to draw attention to the specific parts of the line that were modified.

So, I tried to achieve this using --word-diff=porcelain format and got some unexpected results. But the problem can be demonstrated using the standard --word-diff=plain format as well, so I'll use that here seeing as it is a little easier to read.

Given a normal diff like this:

  -a, b, c, d
  +b, a, c, d

With --word-diff we get:

  [-a,-]b, {+a,+} c, d

Note how there is no whitespace between the removed "a" and the "b", and in the "pre" version of the file (where there is no "a" between the "b" and the "c") there will effectively be too much whitespace. Reconstructing the above in order to give us a colorized diff will yield:

  -a,b,  c, d
  +b, a, c, d

Using HTML-like tags to show where the color tags would be inserted:

  -<red>a,</red>b,  c, d
  +b, <green>a,</green>, c, d

So we get the desired coloring, but our whitespace is wrong in the "pre" line, and right in the "post" line. The problem is information about the whitespace is lost and can't be reconstructed from the output. This kind of whitespace damage doesn't happen in all cases, but in this particular example of moving something within a line, the damage occurs.

Like I said, at first I thought this was a bug, but on reading the "git diff" man page I see:

  "Every non-overlapping match of the <regex> is considered a word. Anything between these matches is considered whitespace and ignored(!) for the purposes of finding differences. You may want to append |[^[:space:]] to your regular expression to make sure that it matches all non-whitespace characters. A match that contains a newline is silently truncated(!) at the newline."

It seems from that that some whitespace information loss is expected. So now I'm wondering if --word-diff, and particularly --word-diff=porcelain, is actually useful for consumption by a script. 

If the output were:

  [-a, -]b, {+a, +}c, d

Then I could colorize like this with no whitespace damage:

  -<red>a, </red>b, c, d
  +b, <green>a, </green> c, d

And I could optionally add a post-processing pass in my script to massage the exact positioning of those color tags to not highlight those trailing spaces:

  -<red>a,</red> b, c, d
  +b, <green>a,</green> c, d

Is what I'm talking about here possibly using "--word-diff=porcelain"? For now I am working with normal diffs and rolling my own intra-line colorization from scratch.

Cheers,
Wincent--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]