Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks

Patrick Palka <patrick@xxxxxxxxxxxx> · Thu, 18 Jun 2015 12:28:58 -0400

On Thu, Jun 18, 2015 at 11:50 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Patrick Palka <patrick@xxxxxxxxxxxx> writes:
>
>> Currently the diff-highlight script does not try to highlight hunks that
>> have different numbers of removed/added lines.  But we can be a little
>> smarter than that, without introducing much magic and complexity.
>>
>> In the case of unevenly-sized hunks, we could still highlight the first
>> few (lexicographical) add/remove pairs.  It is not uncommon for hunks to
>> have common "prefixes", and in such a case this change is very useful
>> for spotting differences.
>>
>> Signed-off-by: Patrick Palka <patrick@xxxxxxxxxxxx>
>> ---
>
> Patrick, "git shortlog --no-merges contrib/diff-highlight/" is your
> friend to see who may be able to give you a good feedback.

Sorry about that.  I admit the sending of this patch was rushed for no
good reason.

>
> Jeff, what do you think?
>
> I have this nagging feeling that it is just as likely that two
> uneven hunks align at the top as they align at the bottom, so while
> this might not hurt it may not be the right approach for a better
> solution, in the sense that when somebody really wants to do a
> better solution, this change and the original code may need to be
> ripped out and redone from scratch.

Hmm, maybe. I stuck with assuming hunks are top-aligned because it
required less code to implement :)

The benefits of a simple dumb solution like assuming hunks align at
the top or bottom is that it remains very easy to visually match up
each highlighted deleted slice with its corresponding highlighted
added slice. If we start matching up similar lines or something like
that then it seems we would have to mostly forsake this benefit.  A
stupid algorithm in this case is nice because its output is
predictable.

A direct improvement upon this patch that would not require redoing
the whole script from scratch would be to first to calculate the
highlighting assuming the hunk aligns at the top, then to calculate
the highlighting assuming the hunk aligns at the bottom, and to pick
out of the two the highlighting with the least "noise".  Though we
would still be out of luck if the hunk is more complicated than being
top-aligned or bottom-aligned.

By the way, what would it take to get something like this script into
git proper?  It is IMHO immensely useful even in its current form, yet
because it's not baked into the application hardly anybody knows about
it.

>
>>  contrib/diff-highlight/diff-highlight | 26 +++++++++++++++++---------
>>  1 file changed, 17 insertions(+), 9 deletions(-)
>>
>> diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight
>> index ffefc31..0dfbebd 100755
>> --- a/contrib/diff-highlight/diff-highlight
>> +++ b/contrib/diff-highlight/diff-highlight
>> @@ -88,22 +88,30 @@ sub show_hunk {
>>               return;
>>       }
>>
>> -     # If we have mismatched numbers of lines on each side, we could try to
>> -     # be clever and match up similar lines. But for now we are simple and
>> -     # stupid, and only handle multi-line hunks that remove and add the same
>> -     # number of lines.
>> -     if (@$a != @$b) {
>> -             print @$a, @$b;
>> -             return;
>> -     }
>> +     # We match up the first MIN(a, b) lines on each side.
>> +     my $c = @$a < @$b ? @$a : @$b;
>>
>> +     # Highlight each pair, and print each removed line of that pair.
>>       my @queue;
>> -     for (my $i = 0; $i < @$a; $i++) {
>> +     for (my $i = 0; $i < $c; $i++) {
>>               my ($rm, $add) = highlight_pair($a->[$i], $b->[$i]);
>>               print $rm;
>>               push @queue, $add;
>>       }
>> +
>> +     # Print the remaining unmatched removed lines of the hunk.
>> +     for (my $i = $c; $i < @$a; $i++) {
>> +             print $a->[$i];
>> +     }
>> +
>> +     # Print the added lines of each highlighted pair.
>>       print @queue;
>> +
>> +     # Print the remaining unmatched added lines of the hunk.
>> +     for (my $i = $c; $i < @$b; $i++) {
>> +             print $b->[$i];
>> +     }
>> +
>>  }
>>
>>  sub highlight_pair {
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html