Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 19, 2015 at 03:34:55AM -0400, Jeff King wrote:

> And here's some more bad news. If you look at the diff for this
> patch itself, it's terribly unreadable (the regular diff already is
> pretty bad, but the highlights make it much worse). There are big chunks
> where we take away 5 or 10 lines from the old code, and replace them
> with totally unrelated lines. We end up highlighting almost the entire
> thing, except for spaces and punctuation.
> 
> We might be able to solve this with a percentage heuristic similar to
> the one Patrick proposed. It's not really interesting to highlight
> unless we're doing it on probably 20% or less of the diff (where 20% is
> a number I just made up).

That turned out to be pretty easy; patch is below (on top of what I sent
earlier). I set the percentage at 50% based on eyeballing "git log -p"
in git.git, and it seems to give good results.

So I think the big remaining issue is improved tokenizing. Maybe Patrick
will want to take a stab at it.

---
diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight
index 1525ccc..9454446 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -114,12 +114,32 @@ sub show_hunk {
 			if $bits & 2;
 	}
 
+	my $highlighted = count_highlight(@highlight_a) +
+			  count_highlight(@highlight_b);
+	my $total = length($a) + length($b);
+	my $pct = $highlighted / $total;
+
+	if ($pct > 0.5) {
+		@highlight_a = ();
+		@highlight_b = ();
+	}
+
 	# And now show the output both with the original stripped annotations,
 	# as well as our new highlights.
 	show_image($a, [merge_annotations(\@stripped_a, \@highlight_a)]);
 	show_image($b, [merge_annotations(\@stripped_b, \@highlight_b)]);
 }
 
+sub count_highlight {
+	my $total = 0;
+	while (@_) {
+		my $from = shift;
+		my $to = shift;
+		$total += $to->[0] - $from->[0];
+	}
+	return $total;
+}
+
 # Strip out any diff syntax (i.e., leading +/-), along with any ANSI color
 # codes from the pre- or post-image of a hunk. The result is a string of text
 # suitable for diffing against the other side of the hunk.
--
To unsubscribe from this list: send the line "unsubscribe git" in



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]