Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks

Jeff King <peff@xxxxxxxx> · Fri, 19 Jun 2015 01:32:23 -0400

On Thu, Jun 18, 2015 at 09:49:19PM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > ... I think you could also argue that
> > because of whitespace-highlighting, colorized diffs are fundamentally
> > going to have colors intermingled with the content and should not be
> > parsed this way.
> 
> Painting of whitespace breakages is asymmetric [*1*].  If you change
> something on a badly indented line without fixing the indentation,
> e.g.
> 
> 	-<SP><TAB>hello-world
>         +<SP><TAB>hello-people
> 
> only the space-before-tab on the latter line is painted.
> 
> For the purpose of your diff highlighting, however, you would want
> to treat the leading "<SP><TAB>hello-" on preimage and postimage
> lines as unchanged.

I do strip it off, so it is OK for it to be different in both the
pre-image and post-image. But what I can't tolerate is the
intermingling with actual data:

  +\t\t\x1b[32m;foo
  +\t\x1b[32m;bar

Those are both post-image lines. I can strip off the "+" from each side
to compare their inner parts to the pre-image. But the intermingled
color gets in my way. I can simply strip all colors from the whole line,
but then information is lost; how do I know where to put them back
again? It is not just "add back the color at the beginning" (which is
what I do with the prefix).

I think the answer is that I must strip them out, retaining the colors
found in each line along with their offset into the line, and then as I
write out the lines, re-add them at the appropriate spots (taking care
to use the original offsets, not the ones with the highlighting added
in).

> > All the more reason to try to move this inside diff.c, I guess.
> 
> That too, probably.

Hmm, I thought that would solve all my problems by operating on the
pre-color version without much more work. But...

> If we were to do this, I think it may make sense to separate the
> logic to compute which span of the string need to be painted in what
> color and the routine to actually emit the colored output.  That is,
> instead of letting ws-check-emit to decide which part should be in
> what color _and_ emitting the result, we should have a helper that
> reads <line, len>, and give us an array of spans to flag as
> whitespace violation.  Then an optional diff-highlight code would
> scan the same <line, len> (or a collection of <line, len>) without
> getting confused by the whitespace errors and annotate the spans to
> be highlighted.  After all that happens, the output routine would
> coalesce the spans and produce colored output.
> 
> Or something like that ;-)

I think this "array of spans" is the only way to go. Otherwise whichever
markup scheme processes the hunk first ruins the data for the next
processor.

So it is a lot more work to make the two work together. The --word-diff
code would have the same issue, except that I imagine it just skips
whitespace-highlighting altogether.

The least-work thing may actually be teaching the separate
diff-highlight script to strip out the colorizing and re-add it by
offset.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html