Jakub Narebski <jnareb@xxxxxxxxx> wrote: > Michał Kiedrowicz <michal.kiedrowicz@xxxxxxxxx> writes: > > > Reading diff output is sometimes very hard, even if it's colored, > > especially if lines differ only in few characters. This is often > > true when a commit fixes a typo or renames some variables or > > functions. > > > This is certainly nice feature to have. I think most tools that > implement side-by-side diff implement this too. > > > This commit teaches gitweb to highlight characters that are > > different between old and new line. This should work in the > > similar manner as in Trac or GitHub. > > > Doe you have URLs with good examples of such diff refinement > highlighting (GNU Emacs ediff/emerge terminology) / highlighting > differing segments (diff-highlight terminology)? I haven't found *examples* on GitHub and Trac sites, but what about these ones: https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795 http://trac.edgewall.org/changeset/10973 > > > The code that compares lines is based on > > contrib/diff-highlight/diff-highlight, except that it works with > > multiline changes too. It also won't highlight lines that are > > completely different because that would only make the output > > unreadable. > > > So what does it look like? From the contrib/diff-highlight/README I > guess that it finds common prefix and common suffix, and highlights > the rest, which includes change. Yes. > It doesn't implement LCS / diff > algorithm like e.g. tkdiff does for its diff refinement highlighting, > isn't it? > I doesn't. I share the Jeff's opinion that: a) Jeff's approach is "good enough" b) LCS on bytes could be very confusing if it marked few sets of characters. > By completly different you mean that they do not have common prefix or > common suffix (at least one of them), isn't it? Yes, but I also don't highlight lines which prefix/suffix consists only of whitespace (This is quite common). I would also consider ignoring prefixes/suffixes with punctuation, like: - * I like you. + * Alice had a little lamb. > > > Combined diffs are not supported but a following commit will change > > it. > > > O.K. > > > Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@xxxxxxxxx> > > --- > > gitweb/gitweb.perl | 82 > > ++++++++++++++++++++++++++++++++++++++++++++++++--- 1 files > > changed, 77 insertions(+), 5 deletions(-) > > > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > > index db61553..1a5b454 100755 > > --- a/gitweb/gitweb.perl > > +++ b/gitweb/gitweb.perl > > @@ -2322,7 +2322,7 @@ sub format_cc_diff_chunk_header { > > # wrap patch (diff) line into a <div> (not to be used for diff > > headers), # the line must be esc_html()'ed > > sub format_diff_line { > > - my ($line, $diff_class, $from, $to) = @_; > > + my ($line, $diff_class) = @_; > > Why that change? I think it fallout from previous patch where format_diff_line() stopped using $from and $to. Will fix that. > > > > > my $diff_classes = "diff"; > > $diff_classes .= " $diff_class" if ($diff_class); > > @@ -4923,14 +4923,85 @@ sub print_inline_diff_lines { > > print foreach (@$add); > > } > > > > +# Highlight characters from $prefix to $suffix and escape HTML. > > +# $str is a reference to the array of characters. > > +sub esc_html_mark_range { > > + my ($str, $prefix, $suffix) = @_; > > + > > + # Don't generate empty <span> element. > > + if ($prefix == $suffix + 1) { > > + return esc_html(join('', @$str), -nbsp=>1); > > + } > > + > > + my $before = join('', @{$str}[0..($prefix - 1)]); > > + my $marked = join('', @{$str}[$prefix..$suffix]); > > + my $after = join('', @{$str}[($suffix + 1)..$#{$str}]); > > Eeeeeek! First you split into letters, in caller at that, then join? > Why not pass striung ($str suggests string not array of characters), > and use substr instead? > > [Please disregard this and the next paragraph at first reading] I will rename $str to something more self describing. > > > + > > + return esc_html($before, -nbsp=>1) . > > + $cgi->span({-class=>'marked'}, esc_html($marked, > > -nbsp=>1)) . > > + esc_html($after,-nbsp=>1); > > +} > > Anyway I have send to git mailing list a patch series, which in one of > patches adds esc_html_match_hl($str, $regexp) to highlight matches in > a string. Your esc_html_mark_range(), after a generalization, could > be used as underlying "engine". > > Something like this, perhaps (untested): > > # Highlight selected fragments of string, using given CSS class, > # and escape HTML. It is assumed that fragments do not overlap. > # Regions are passed as list of pairs (array references). > sub esc_html_hl { > my ($str, $css_class, @sel) = @_; > return esc_html($str) unless @sel; > > my $out = ''; > my $pos = 0; > > for my $s (@sel) { > $out .= esc_html(substr($str, $pos, $s->[0] - $pos)) > if ($s->[0] - $pos > 0); > $out .= $cgi->span({-class => $css_class}, > esc_html(substr($str, $s->[0], > $s->[1] - $s->[0]))); > > $pos = $m->[1]; > } > $out .= esc_html(substr($str, $pos)) > if ($pos < length($str)); > > return $out; > } > > > + > > +# Format removed and added line, mark changed part and HTML-format > > them. > > You should probably ad here that this code is taken from > diff-highlight in contrib area, isn't it? True. > > > +sub format_rem_add_line { > > + my ($rem, $add) = @_; > > + my @r = split(//, $rem); > > + my @a = split(//, $add); > > + my ($esc_rem, $esc_add); > > + my ($prefix, $suffix_rem, $suffix_add) = (1, $#r, $#a); > > It is not as much $prefix, as $prefix_len, isn't it? Yes. > Shouldn't > $prefix / $prefix_len start from 0, not from 1? It starts from 1 because it skips first +/-. It should become obvious after reading the comment from last patch :). + # In combined diff we must ignore two +/- characters. + $prefix = 2 if ($is_combined); + > > > + my ($prefix_is_space, $suffix_is_space) = (1, 1); > > + > > + while ($prefix < @r && $prefix < @a) { > > + last if ($r[$prefix] ne $a[$prefix]); > > + > > + $prefix_is_space = 0 if ($r[$prefix] !~ /\s/); > > + $prefix++; > > + } > > Ah, I see that it is easier to find common prefix by treating string > as array of characters. > > Though I wonder if it wouldn't be easier to use trick of XOR-ing two > strings (see "Bitwise String Operators" in perlop(1)): > > my $xor = "$rem" ^ "$add"; > > and finding starting sequence of "\0", which denote common prefix. > > > Though this and the following is a nice implementation of > algorithm... as it would be implemented in C. Nevermind, it might be > good enough... The splitting and comparing by characters is taken from diff-highlight. I don't think it's worth changing here. > > > + > > + while ($suffix_rem >= $prefix && $suffix_add >= $prefix) { > > + last if ($r[$suffix_rem] ne $a[$suffix_add]); > > + > > + $suffix_is_space = 0 if ($r[$suffix_rem] !~ /\s/); > > + $suffix_rem--; > > + $suffix_add--; > > + } > > BTW., perhaps using single negative $suffix_len instead of separate > $suffix_rem_pos and $suffix_add_pos would make code simpler and easier > to understand? I'll try that. > > > + > > + # Mark lines that are different from each other, but have > > some common > > + # part that isn't whitespace. If lines are completely > > different, don't > > + # mark them because that would make output unreadable, > > especially if > > + # diff consists of multiple lines. > > + if (($prefix == 1 && $suffix_rem == $#r && $suffix_add == > > $#a) > > + || ($prefix_is_space && $suffix_is_space)) { > > Micronit about style: in gitweb we put boolean operator at the end of > continued line, not at beginning of next one. > > So this would be: > > + if (($prefix == 1 && $suffix_rem == $#r && $suffix_add == > $#a) || > + ($prefix_is_space && $suffix_is_space)) { > OK > > + $esc_rem = esc_html($rem); > > + $esc_add = esc_html($add); > > + } else { > > + $esc_rem = esc_html_mark_range(\@r, $prefix, > > $suffix_rem); > > + $esc_add = esc_html_mark_range(\@a, $prefix, > > $suffix_add); > > + } > > + > > + return format_diff_line($esc_rem, 'rem'), > > + format_diff_line($esc_add, 'add'); > > Please use spaces to align. OK > > > +} > > + > > # HTML-format diff context, removed and added lines. > > sub format_ctx_rem_add_lines { > > - my ($ctx, $rem, $add) = @_; > > + my ($ctx, $rem, $add, $is_combined) = @_; > > my (@new_ctx, @new_rem, @new_add); > > + my $num_add_lines = @$add; > > Why is this temporary variable needed? If you are not sure that == > operator enforces scalar context on both arguments you can always > write > > scalar @$add == scalar @$rem > You read my mind. > > + > > + if (!$is_combined && $num_add_lines > 0 && $num_add_lines > > == @$rem) { > > + for (my $i = 0; $i < $num_add_lines; $i++) { > > + my ($line_rem, $line_add) = > > format_rem_add_line( > > + $rem->[$i], $add->[$i]); > > + push @new_rem, $line_rem; > > + push @new_add, $line_add; > > The original contrib/diff-highlight works only for single changed line > (single removed and single added). You make this code work also for > the case where number of aded lines is equal to the number of removed > lines, and assume that subsequent changed lines in preimage > correcponds to subsequent changed lines in postimage, which is not > always true: > > -foo > -bar > +baz > +fooo > > This is not described in commit message, I think. True. Note that in this particular case nothing should be highlighted because corresponding lines don't have common prefix/suffix. > > > + } > > + } else { > > + @new_rem = map { format_diff_line(esc_html($_, > > -nbsp=>1), 'rem') } @$rem; > > + @new_add = map { format_diff_line(esc_html($_, > > -nbsp=>1), 'add') } @$add; > > + } > > > > @new_ctx = map { format_diff_line(esc_html($_, -nbsp=>1), > > 'ctx') } @$ctx; > > - @new_rem = map { format_diff_line(esc_html($_, -nbsp=>1), > > 'rem') } @$rem; > > - @new_add = map { format_diff_line(esc_html($_, -nbsp=>1), > > 'add') } @$add; > > return (\@new_ctx, \@new_rem, \@new_add); > > } > > @@ -4939,7 +5010,8 @@ sub format_ctx_rem_add_lines { > > sub print_diff_lines { > > my ($ctx, $rem, $add, $diff_style, $is_combined) = @_; > > > > - ($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, > > $add); > > + ($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, > > $add, > > + $is_combined); > > O.K., now the code depends on $is_combined > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html