Re: [PATCH v2 7/8] gitweb: Highlight interesting parts of diff

Michał Kiedrowicz <michal.kiedrowicz@xxxxxxxxx> · Thu, 29 Mar 2012 21:59:22 +0200

Jakub Narebski <jnareb@xxxxxxxxx> wrote:

> On Fri, 23 Mar 2012, Michał Kiedrowicz wrote:
> 
> > Reading diff output is sometimes very hard, even if it's colored,
> > especially if lines differ only in few characters.  This is often true
> > when a commit fixes a typo or renames some variables or functions.
> > 
> > This commit teaches gitweb to highlight characters that are different
> > between old and new line with a light green/red background.  This should
> > work in the similar manner as in Trac or GitHub.
> > 
> > The algorithm that compares lines is based on contrib/diff-highlight.
> > Basically, it works by determining common prefix/suffix of corresponding
> > lines and highlightning only the middle part of lines.  For more
> > information, see contrib/diff-highlight/README.
> >
> Nice description.
>  
> > Combined diffs are not supported but a following commit will change it.
> > 
> O.K.
> 
> > Two changes in format_diff_line() were needed to allow diff refinement
> > highlightning to work.  Firstly, format_diff_line() was taught to not
> > esc_html() line that was passed as a reference.  This was needed because
> > refining the highlight must be performed on unescaped lines and it uses
> > a <span> element to mark interesting parts of the line.
> 
> Well, actually we just need to make format_diff_line to not esc_html
> passed line which is already esc_html'ed or esc_html_hl_regions'ed,
> i.e. to avoid double escaping.  Passing line as reference if it is
> to be treated as HTML is one possibility, and I think it is a good
> convention to start to use.
> 
> >                                                         Secondly, the 
> > lines are untabified before comparing because calling untabify()
> > after inserting <span>'s could improperly convert tabs to spaces.
> 
> Well, it is actually because untabify() must work on original text to
> work correctly, i.e. to convert tabs to spaces according to tab stops.
> It must precede esc_html'ing, and even more esc_html_hl_regions'ing.
> 
> > 
> > Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@xxxxxxxxx>
> 
> Anyway this is a good change, much cleaner than previous version
> 
>   Acked-by: Jakub Narębski <jnareb@xxxxxxxxx>

Thanks.

> 
> > ---
> >  gitweb/gitweb.perl       |   84 ++++++++++++++++++++++++++++++++++++++++++----
> >  gitweb/static/gitweb.css |    8 ++++
> >  2 files changed, 85 insertions(+), 7 deletions(-)
> > 
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index db32588..872ba12 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -2426,14 +2426,14 @@ sub format_cc_diff_chunk_header {
> >  }
> >  
> >  # process patch (diff) line (not to be used for diff headers),
> > -# returning HTML-formatted (but not wrapped) line
> > +# returning HTML-formatted (but not wrapped) line.
> > +# If the line is already esc_html()'ed, pass it as a reference.
> 
> Errr... I think the explanation here must be in reverse:
> 
>   +# If the line is passed as a reference, it is treated as HTML,
>   +# and not esc_html()'ed.

Yeah, I thought about that option too :).

> 
> >  sub format_diff_line {
> >  	my ($line, $diff_class, $from, $to) = @_;
> >  
> > -	chomp $line;
> > -	$line = untabify($line);
> > -
> > -	if ($from && $to && $line =~ m/^\@{2} /) {
> > +	if (ref($line)) {
> > +		$line = $$line;
> > +	} elsif ($from && $to && $line =~ m/^\@{2} /) {
> >  		$line = format_unidiff_chunk_header($line, $from, $to);
> >  	} elsif ($from && $to && $line =~ m/^\@{3}/) {
> >  		$line = format_cc_diff_chunk_header($line, $from, $to);
> > @@ -5054,10 +5054,80 @@ sub print_inline_diff_lines {
> >  	print foreach (@$add);
> >  }
> >  
> > +# Format removed and added line, mark changed part and HTML-format them.
> > +# Impementation is based on contrib/diff-highlight
> > +sub format_rem_add_line {
> 
> Wouldn't a better name be format_rem_add_pair() or format_rem_add_lines(),
> or format_rem_add_lines_pair()?

OK.

> 
> > +	my ($rem, $add) = @_;
> > +	my @rem = split(//, $rem);
> > +	my @add = split(//, $add);
> > +	my ($esc_rem, $esc_add);
> > +	# Ignore +/- character, thus $prefix_len is set to 1.
> > +	my ($prefix_len, $suffix_len) = (1, 0);
> 
> I wonder if it wouldn't be slightly cleaner to count $prefix_len from
> +/- rather than start of line.  It means
> 
>   +	# Prefix does not count initial +/- character.
>   +	my ($prefix_len, $suffix_len) = (0, 0);
> 
> Perhaps even remove initial +/- from @add and @rem.
> 
>   +	shift @rem;
>   +	shift @add;
> 
> Ehh... now I see that starting $prefix_len with 1 might be not so bad
> idea after all.
> 
> > +	my ($prefix_is_space, $suffix_is_space) = (1, 1);
> 
> It is not entirely true: $prefix_is_space is really $prefix_is_space_or_empty.
> It might be a better idea to use
> 
>   +	my ($prefix_has_nonspace, $suffix_has_nonspace);
> 
> using the fact that undefined value is false.
> 
> > +
> > +	while ($prefix_len < @rem && $prefix_len < @add) {
> > +		last if ($rem[$prefix_len] ne $add[$prefix_len]);
> > +
> > +		$prefix_is_space = 0 if ($rem[$prefix_len] !~ /\s/);
> 
>   +		$prefix_has_nonspace = 1 if ($rem[$prefix_len] =~ /\s/);
> 
> > +		$prefix_len++;
> > +	}
> > +
> > +	my $shorter = (@rem < @add) ? @rem : @add;
> > +	while ($prefix_len + $suffix_len < $shorter) {
> > +		last if ($rem[-1 - $suffix_len] ne $add[-1 - $suffix_len]);
> > +
> > +		$suffix_is_space = 0 if ($rem[-1 - $suffix_len] !~ /\s/);
> 
>   +		$suffix_has_nonspace = 1 if ($rem[-1 - $suffix_len] =~ /\s/);
> 
> > +		$suffix_len++;
> > +	}
> > +
> > +	# Mark lines that are different from each other, but have some common
> > +	# part that isn't whitespace.  If lines are completely different, don't
> > +	# mark them because that would make output unreadable, especially if
> > +	# diff consists of multiple lines.
> > +	if (($prefix_len == 1 && $suffix_len == 0) ||
> > +	    ($prefix_is_space && $suffix_is_space)) {
> 
> Actually ($prefix_is_space && $suffix_is_space) is enough, but cryptic.

Yes, you caught was I thought too but hadn't wrote explicitly.  The
first condition ($prefix_len == 1 && $suffix_len == 0) is redundant but
makes the condition easier to understand.

> With $prefix_is_space (== $prefix_is_space_or_empty) -> $prefix_has_nonspace
> it would be
> 
>   +	if (not ($prefix_has_nonspace || $suffix_has_nonspace)) {
> 
> Anyway, it is not as if it is not clear enough.

I may play with this for a while to see which version looks best.

> 
> > +		$esc_rem = esc_html($rem, -nbsp=>1);
> > +		$esc_add = esc_html($add, -nbsp=>1);
> > +	} else {
> > +		$esc_rem = esc_html_hl_regions($rem, 'marked', [$prefix_len, @rem - $suffix_len], -nbsp=>1);
> > +		$esc_add = esc_html_hl_regions($add, 'marked', [$prefix_len, @add - $suffix_len], -nbsp=>1);
> 
>   +		$esc_rem = esc_html_hl_regions($rem, 'marked', [$prefix_len+1, @rem+1 - $suffix_len], -nbsp=>1);
>   +		$esc_add = esc_html_hl_regions($add, 'marked', [$prefix_len+1, @add+1 - $suffix_len], -nbsp=>1);
> 
> So probably not worth it.
> 
> > +	}
> > +
> > +	return format_diff_line(\$esc_rem, 'rem'),
> > +	        format_diff_line(\$esc_add, 'add');
> 
>   +	return format_diff_line(\$esc_rem, 'rem'),
>   +	       format_diff_line(\$esc_add, 'add');
> 
> > +}
> > +
> > +# HTML-format diff context, removed and added lines.
> > +sub format_ctx_rem_add_lines {
> > +	my ($ctx, $rem, $add, $is_combined) = @_;
> > +	my (@new_ctx, @new_rem, @new_add);
> > +
> > +	# Highlight if every removed line has a corresponding added line.
> > +	# Combined diffs are not supported ATM.
> 
> ATM == at this moment?  Please say so.

OK.

> 
> > +	if (!$is_combined && @$add > 0 && @$add == @$rem) {
> > +		for (my $i = 0; $i < @$add; $i++) {
> > +			my ($line_rem, $line_add) = format_rem_add_line(
> > +			        $rem->[$i], $add->[$i]);
> 
>   +			my ($line_rem, $line_add) =
>   +			        format_rem_add_line($rem->[$i], $add->[$i]);
> 
> > +			push @new_rem, $line_rem;
> > +			push @new_add, $line_add;
> > +		}
> > +	} else {
> > +		@new_rem = map { format_diff_line($_, 'rem') } @$rem;
> > +		@new_add = map { format_diff_line($_, 'add') } @$add;
> > +	}
> > +
> > +	@new_ctx = map { format_diff_line($_, 'ctx') } @$ctx;
> > +
> > +	return (\@new_ctx, \@new_rem, \@new_add);
> > +}
> 
> Nice.
> 
> > +
> >  # Print context lines and then rem/add lines.
> >  sub print_diff_lines {
> >  	my ($ctx, $rem, $add, $diff_style, $is_combined) = @_;
> >  
> > +	($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, $add,
> > +		$is_combined);
> > +
> 
>   +	($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, $add,	$is_combined);
>   +
> 
> Unless the line is too long.
> 
> >  	if ($diff_style eq 'sidebyside' && !$is_combined) {
> >  		print_sidebyside_diff_lines($ctx, $rem, $add);
> >  	} else {
> > @@ -5089,11 +5159,11 @@ sub print_diff_chunk {
> >  	foreach my $line_info (@chunk) {
> >  		my ($class, $line) = @$line_info;
> >  
> > -		$line = format_diff_line($line, $class, $from, $to);
> > +		$line = untabify($line);
> 
> O.K. you move untabify() out of format_diff_line(), and must now
> ensure that caller uses it before calling format_diff_line() or 
> format_ctx_rem_add_lines() (which uses format_diff_line() itself).
> 
> I wonder if leaving untabify() (and chomp) in format_diff_line(),
> but not running it if passed reference, and running untabify() in
> format_ctx_rem_add_lines() wouldn't be a better, less fragile code.
> 

I can try that.

> >  
> >  		# print chunk headers
> >  		if ($class && $class eq 'chunk_header') {
> > -			print $line;
> > +			print format_diff_line($line, $class, $from, $to);
> 
> O.K., only 'chunk_header' are not formatted.
> 
> >  			next;
> >  		}
> 
> [I have to take a look how final version of print_diff_lines() looks
>  like in this commit; the patch alone is not enough]
> 
> Right, so we format just before printing, and print_* do formatting
> itself before printing.  Nice and clean.

Thanks.

> 
> >  
> > diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css
> > index c530355..3c4a3c9 100644
> > --- a/gitweb/static/gitweb.css
> > +++ b/gitweb/static/gitweb.css
> > @@ -438,6 +438,10 @@ div.diff.add {
> >  	color: #008800;
> >  }
> >  
> > +div.diff.add span.marked {
> > +	background-color: #77ff77;
> > +}
> > +
> >  div.diff.from_file a.path,
> >  div.diff.from_file {
> >  	color: #aa0000;
> > @@ -447,6 +451,10 @@ div.diff.rem {
> >  	color: #cc0000;
> >  }
> >  
> > +div.diff.rem span.marked {
> > +	background-color: #ff7777;
> > +}
> > +
> >  div.diff.chunk_header a,
> >  div.diff.chunk_header {
> >  	color: #990099;
> > -- 
> 
> Nice styling.
> 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html