Jakub Narebski <jnareb@xxxxxxxxx> wrote: > On Fri, 23 Mar 2012, Michał Kiedrowicz wrote: > > > Reading diff output is sometimes very hard, even if it's colored, > > especially if lines differ only in few characters. This is often true > > when a commit fixes a typo or renames some variables or functions. > > > > This commit teaches gitweb to highlight characters that are different > > between old and new line with a light green/red background. This should > > work in the similar manner as in Trac or GitHub. > > > > The algorithm that compares lines is based on contrib/diff-highlight. > > Basically, it works by determining common prefix/suffix of corresponding > > lines and highlightning only the middle part of lines. For more > > information, see contrib/diff-highlight/README. > > > Nice description. > > > Combined diffs are not supported but a following commit will change it. > > > O.K. > > > Two changes in format_diff_line() were needed to allow diff refinement > > highlightning to work. Firstly, format_diff_line() was taught to not > > esc_html() line that was passed as a reference. This was needed because > > refining the highlight must be performed on unescaped lines and it uses > > a <span> element to mark interesting parts of the line. > > Well, actually we just need to make format_diff_line to not esc_html > passed line which is already esc_html'ed or esc_html_hl_regions'ed, > i.e. to avoid double escaping. Passing line as reference if it is > to be treated as HTML is one possibility, and I think it is a good > convention to start to use. > > > Secondly, the > > lines are untabified before comparing because calling untabify() > > after inserting <span>'s could improperly convert tabs to spaces. > > Well, it is actually because untabify() must work on original text to > work correctly, i.e. to convert tabs to spaces according to tab stops. > It must precede esc_html'ing, and even more esc_html_hl_regions'ing. > > > > > Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@xxxxxxxxx> > > Anyway this is a good change, much cleaner than previous version > > Acked-by: Jakub Narębski <jnareb@xxxxxxxxx> Thanks. > > > --- > > gitweb/gitweb.perl | 84 ++++++++++++++++++++++++++++++++++++++++++---- > > gitweb/static/gitweb.css | 8 ++++ > > 2 files changed, 85 insertions(+), 7 deletions(-) > > > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > > index db32588..872ba12 100755 > > --- a/gitweb/gitweb.perl > > +++ b/gitweb/gitweb.perl > > @@ -2426,14 +2426,14 @@ sub format_cc_diff_chunk_header { > > } > > > > # process patch (diff) line (not to be used for diff headers), > > -# returning HTML-formatted (but not wrapped) line > > +# returning HTML-formatted (but not wrapped) line. > > +# If the line is already esc_html()'ed, pass it as a reference. > > Errr... I think the explanation here must be in reverse: > > +# If the line is passed as a reference, it is treated as HTML, > +# and not esc_html()'ed. Yeah, I thought about that option too :). > > > sub format_diff_line { > > my ($line, $diff_class, $from, $to) = @_; > > > > - chomp $line; > > - $line = untabify($line); > > - > > - if ($from && $to && $line =~ m/^\@{2} /) { > > + if (ref($line)) { > > + $line = $$line; > > + } elsif ($from && $to && $line =~ m/^\@{2} /) { > > $line = format_unidiff_chunk_header($line, $from, $to); > > } elsif ($from && $to && $line =~ m/^\@{3}/) { > > $line = format_cc_diff_chunk_header($line, $from, $to); > > @@ -5054,10 +5054,80 @@ sub print_inline_diff_lines { > > print foreach (@$add); > > } > > > > +# Format removed and added line, mark changed part and HTML-format them. > > +# Impementation is based on contrib/diff-highlight > > +sub format_rem_add_line { > > Wouldn't a better name be format_rem_add_pair() or format_rem_add_lines(), > or format_rem_add_lines_pair()? OK. > > > + my ($rem, $add) = @_; > > + my @rem = split(//, $rem); > > + my @add = split(//, $add); > > + my ($esc_rem, $esc_add); > > + # Ignore +/- character, thus $prefix_len is set to 1. > > + my ($prefix_len, $suffix_len) = (1, 0); > > I wonder if it wouldn't be slightly cleaner to count $prefix_len from > +/- rather than start of line. It means > > + # Prefix does not count initial +/- character. > + my ($prefix_len, $suffix_len) = (0, 0); > > Perhaps even remove initial +/- from @add and @rem. > > + shift @rem; > + shift @add; > > Ehh... now I see that starting $prefix_len with 1 might be not so bad > idea after all. > > > + my ($prefix_is_space, $suffix_is_space) = (1, 1); > > It is not entirely true: $prefix_is_space is really $prefix_is_space_or_empty. > It might be a better idea to use > > + my ($prefix_has_nonspace, $suffix_has_nonspace); > > using the fact that undefined value is false. > > > + > > + while ($prefix_len < @rem && $prefix_len < @add) { > > + last if ($rem[$prefix_len] ne $add[$prefix_len]); > > + > > + $prefix_is_space = 0 if ($rem[$prefix_len] !~ /\s/); > > + $prefix_has_nonspace = 1 if ($rem[$prefix_len] =~ /\s/); > > > + $prefix_len++; > > + } > > + > > + my $shorter = (@rem < @add) ? @rem : @add; > > + while ($prefix_len + $suffix_len < $shorter) { > > + last if ($rem[-1 - $suffix_len] ne $add[-1 - $suffix_len]); > > + > > + $suffix_is_space = 0 if ($rem[-1 - $suffix_len] !~ /\s/); > > + $suffix_has_nonspace = 1 if ($rem[-1 - $suffix_len] =~ /\s/); > > > + $suffix_len++; > > + } > > + > > + # Mark lines that are different from each other, but have some common > > + # part that isn't whitespace. If lines are completely different, don't > > + # mark them because that would make output unreadable, especially if > > + # diff consists of multiple lines. > > + if (($prefix_len == 1 && $suffix_len == 0) || > > + ($prefix_is_space && $suffix_is_space)) { > > Actually ($prefix_is_space && $suffix_is_space) is enough, but cryptic. Yes, you caught was I thought too but hadn't wrote explicitly. The first condition ($prefix_len == 1 && $suffix_len == 0) is redundant but makes the condition easier to understand. > With $prefix_is_space (== $prefix_is_space_or_empty) -> $prefix_has_nonspace > it would be > > + if (not ($prefix_has_nonspace || $suffix_has_nonspace)) { > > Anyway, it is not as if it is not clear enough. I may play with this for a while to see which version looks best. > > > + $esc_rem = esc_html($rem, -nbsp=>1); > > + $esc_add = esc_html($add, -nbsp=>1); > > + } else { > > + $esc_rem = esc_html_hl_regions($rem, 'marked', [$prefix_len, @rem - $suffix_len], -nbsp=>1); > > + $esc_add = esc_html_hl_regions($add, 'marked', [$prefix_len, @add - $suffix_len], -nbsp=>1); > > + $esc_rem = esc_html_hl_regions($rem, 'marked', [$prefix_len+1, @rem+1 - $suffix_len], -nbsp=>1); > + $esc_add = esc_html_hl_regions($add, 'marked', [$prefix_len+1, @add+1 - $suffix_len], -nbsp=>1); > > So probably not worth it. > > > + } > > + > > + return format_diff_line(\$esc_rem, 'rem'), > > + format_diff_line(\$esc_add, 'add'); > > + return format_diff_line(\$esc_rem, 'rem'), > + format_diff_line(\$esc_add, 'add'); > > > +} > > + > > +# HTML-format diff context, removed and added lines. > > +sub format_ctx_rem_add_lines { > > + my ($ctx, $rem, $add, $is_combined) = @_; > > + my (@new_ctx, @new_rem, @new_add); > > + > > + # Highlight if every removed line has a corresponding added line. > > + # Combined diffs are not supported ATM. > > ATM == at this moment? Please say so. OK. > > > + if (!$is_combined && @$add > 0 && @$add == @$rem) { > > + for (my $i = 0; $i < @$add; $i++) { > > + my ($line_rem, $line_add) = format_rem_add_line( > > + $rem->[$i], $add->[$i]); > > + my ($line_rem, $line_add) = > + format_rem_add_line($rem->[$i], $add->[$i]); > > > + push @new_rem, $line_rem; > > + push @new_add, $line_add; > > + } > > + } else { > > + @new_rem = map { format_diff_line($_, 'rem') } @$rem; > > + @new_add = map { format_diff_line($_, 'add') } @$add; > > + } > > + > > + @new_ctx = map { format_diff_line($_, 'ctx') } @$ctx; > > + > > + return (\@new_ctx, \@new_rem, \@new_add); > > +} > > Nice. > > > + > > # Print context lines and then rem/add lines. > > sub print_diff_lines { > > my ($ctx, $rem, $add, $diff_style, $is_combined) = @_; > > > > + ($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, $add, > > + $is_combined); > > + > > + ($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, $add, $is_combined); > + > > Unless the line is too long. > > > if ($diff_style eq 'sidebyside' && !$is_combined) { > > print_sidebyside_diff_lines($ctx, $rem, $add); > > } else { > > @@ -5089,11 +5159,11 @@ sub print_diff_chunk { > > foreach my $line_info (@chunk) { > > my ($class, $line) = @$line_info; > > > > - $line = format_diff_line($line, $class, $from, $to); > > + $line = untabify($line); > > O.K. you move untabify() out of format_diff_line(), and must now > ensure that caller uses it before calling format_diff_line() or > format_ctx_rem_add_lines() (which uses format_diff_line() itself). > > I wonder if leaving untabify() (and chomp) in format_diff_line(), > but not running it if passed reference, and running untabify() in > format_ctx_rem_add_lines() wouldn't be a better, less fragile code. > I can try that. > > > > # print chunk headers > > if ($class && $class eq 'chunk_header') { > > - print $line; > > + print format_diff_line($line, $class, $from, $to); > > O.K., only 'chunk_header' are not formatted. > > > next; > > } > > [I have to take a look how final version of print_diff_lines() looks > like in this commit; the patch alone is not enough] > > Right, so we format just before printing, and print_* do formatting > itself before printing. Nice and clean. Thanks. > > > > > diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css > > index c530355..3c4a3c9 100644 > > --- a/gitweb/static/gitweb.css > > +++ b/gitweb/static/gitweb.css > > @@ -438,6 +438,10 @@ div.diff.add { > > color: #008800; > > } > > > > +div.diff.add span.marked { > > + background-color: #77ff77; > > +} > > + > > div.diff.from_file a.path, > > div.diff.from_file { > > color: #aa0000; > > @@ -447,6 +451,10 @@ div.diff.rem { > > color: #cc0000; > > } > > > > +div.diff.rem span.marked { > > + background-color: #ff7777; > > +} > > + > > div.diff.chunk_header a, > > div.diff.chunk_header { > > color: #990099; > > -- > > Nice styling. > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html