On Fri, Jun 19, 2015 at 03:34:55AM -0400, Jeff King wrote: > And here's some more bad news. If you look at the diff for this > patch itself, it's terribly unreadable (the regular diff already is > pretty bad, but the highlights make it much worse). There are big chunks > where we take away 5 or 10 lines from the old code, and replace them > with totally unrelated lines. We end up highlighting almost the entire > thing, except for spaces and punctuation. > > We might be able to solve this with a percentage heuristic similar to > the one Patrick proposed. It's not really interesting to highlight > unless we're doing it on probably 20% or less of the diff (where 20% is > a number I just made up). That turned out to be pretty easy; patch is below (on top of what I sent earlier). I set the percentage at 50% based on eyeballing "git log -p" in git.git, and it seems to give good results. So I think the big remaining issue is improved tokenizing. Maybe Patrick will want to take a stab at it. --- diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight index 1525ccc..9454446 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -114,12 +114,32 @@ sub show_hunk { if $bits & 2; } + my $highlighted = count_highlight(@highlight_a) + + count_highlight(@highlight_b); + my $total = length($a) + length($b); + my $pct = $highlighted / $total; + + if ($pct > 0.5) { + @highlight_a = (); + @highlight_b = (); + } + # And now show the output both with the original stripped annotations, # as well as our new highlights. show_image($a, [merge_annotations(\@stripped_a, \@highlight_a)]); show_image($b, [merge_annotations(\@stripped_b, \@highlight_b)]); } +sub count_highlight { + my $total = 0; + while (@_) { + my $from = shift; + my $to = shift; + $total += $to->[0] - $from->[0]; + } + return $total; +} + # Strip out any diff syntax (i.e., leading +/-), along with any ANSI color # codes from the pre- or post-image of a hunk. The result is a string of text # suitable for diffing against the other side of the hunk. -- To unsubscribe from this list: send the line "unsubscribe git" in