On Thu, Jun 18, 2015 at 05:23:56PM -0400, Jeff King wrote: > +# Return the individual diff-able items of the hunk, but with any > +# diff or color prefix/suffix for each line split out (we assume that the > +# prefix/suffix for each line will be the same). > +sub split_hunk { > + my ($prefix, $suffix, @r); > + foreach my $line (@_) { > + $line =~ /^($COLOR*[+-]$COLOR*)(.*)($COLOR*)/ > + or die "eh, this is supposed to match everything!"; This isn't quite right. We'll never match the suffix, because it gets sucked up by the greedy (.*). But making that non-greedy matches nothing, unless we also add "$" at the end. _But_, there is still something else weird going on. I replaced this split: > + push @r, split(//, $2), "\n"; with: push @r, split(/([[:space:][:punct:]])/, $2), "\n"; which behaves much better. With that, 99a2cfb shows: - <if> (!peel_ref(path, peeled)) { - <is>_annotated = !!<hashcmp>(<sha1>, peeled); + <if> (!peel_ref(path, peeled<.hash>)) { + <is>_annotated = !!<oidcmp>(<oid>, <&>peeled); The latter half of both lines looks perfect. But what's that weird highlighting of the initial "if" and "is" on those lines? It turns out that the colored output we produce is quite odd: $ git show --color 99a2cfb | grep peel_ref | cat -A ^[[31m-^Iif (!peel_ref(path, peeled)) {^[[m$ ^[[32m+^[[m^I^[[32mif (!peel_ref(path, peeled.hash)) {^[[m$ For the pre-image, we print the color, the "-", and then the line data. Makes sense. For the post-image, we print the color, "+", a reset, then the initial whitespace, then the color again! So of course the diff algorithm says "hey, there's this weird color in here". The original implementation of diff-highlight didn't care, because it skipped leading whitespace and colors as "boring". But this one cannot do that. It pulls the strict prefix out of each line (and it must, because it must get the same prefix for each line of a hunk). I think git is actually wrong here; it's mingling the ANSI colors and the actual content. Nobody ever noticed because it looks OK to a human, and who would be foolish enough to try machine-parsing colorized diff output? The fix is: diff --git a/diff.c b/diff.c index 87b16d5..a80b5b4 100644 --- a/diff.c +++ b/diff.c @@ -501,9 +501,9 @@ static void emit_line_checked(const char *reset, emit_line_0(ecbdata->opt, ws, reset, sign, line, len); else { /* Emit just the prefix, then the rest. */ - emit_line_0(ecbdata->opt, set, reset, sign, "", 0); + emit_line_0(ecbdata->opt, set, "", sign, "", 0); ws_check_emit(line, len, ecbdata->ws_rule, - ecbdata->opt->file, set, reset, ws); + ecbdata->opt->file, "", reset, ws); } } But I'm a little worried it may interfere with the way the whitespace-checker emits colors (e.g., it may emit its own colors for the leading spaces, and we would need to re-assert our color before showing the rest of the line). So I think you could also argue that because of whitespace-highlighting, colorized diffs are fundamentally going to have colors intermingled with the content and should not be parsed this way. All the more reason to try to move this inside diff.c, I guess. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html