Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff

Michał Kiedrowicz <michal.kiedrowicz@xxxxxxxxx> · Tue, 14 Feb 2012 19:23:40 +0100

Jakub Narebski <jnareb@xxxxxxxxx> wrote:

> A few issues I have just noticed about this series.
> 
> 
> First, about naming.  "Highlighting interesting parts of diff" is
> acceptable name, but "syntax highlighting for diff" is not: gitweb
> already use syntax highlighting in diff views.  Call it "diff refinement
> highlighting", "highlighting changes / changed sections", "intraline
> highlighting".

OK, I'll make sure the naming is correct and consistent.
> 
> 
> Second, I think (but I am not sure) that there is a bug in code that
> finds common suffix and prefix.
> 
> If I understand correctly the idea is to highlight changed part if
> there is at least one of common non-whitespace suffix or prefix.
> So syntax highlighting should look like this:
> 
> 1. Both prefix and suffix are non empty and non whitespace only
> 
>   -foo -{bar} baz
>   +foo +{quux} baz
> 
> 2. Non empty and non whitespace only prefix
> 
>   -foo -{bar}
>   +foo +{quux}
> 
> 2. Non empty and non whitespace only suffix
> 
>   --{bar} baz
>   ++{quux} baz
> 
> But in your code $prefix is not the length of common prefix, but
> the position of end of prefix in the original line of diff.  So
> you start with $prefix = 1... even though the prefix is empty.
> 
> How is that supposed to work?

But see the check:

+	# Mark lines that are different from each other, but have some common
+	# part that isn't whitespace.  If lines are completely different, don't
+	# mark them because that would make output unreadable, especially if
+	# diff consists of multiple lines.
+	if (($prefix == 1 && $suffix_rem == $#r && $suffix_add == $#a)
                        ^
                        |
                        This part.

+		|| ($prefix_is_space && $suffix_is_space)) {
+		$esc_rem = esc_html($rem);
+		$esc_add = esc_html($add);
+	} else {
+		$esc_rem = esc_html_mark_range(\@r, $prefix, $suffix_rem);
+		$esc_add = esc_html_mark_range(\@a, $prefix, $suffix_add);
+	}

I guess it's still not correct because it should be equal to number of
parents or $prefix_length should start from 0 like you wrote later.

> 
> 
> On Mon, 13 Feb 2012, Michał Kiedrowicz wrote:
> > Jakub Narebski <jnareb@xxxxxxxxx> wrote:
> > > On Mon, 13 Feb 2012, Michal Kiedrowicz wrote:
> 
> > > > I haven't found *examples* on GitHub and Trac sites, but what about
> > > > these ones:
> > > > 
> > > > https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795
> > > > http://trac.edgewall.org/changeset/10973
> [...]
> > > BTW GitHub is closed source, but we can check what algorithm does Trac
> > > use for diff refinement highlighting (highlighting changed portions of
> > > diff).
> > > 
> > 
> > I think it's
> > http://trac.edgewall.org/browser/trunk/trac/versioncontrol/diff.py
> > (see markup intraline_changes()).
>  
> It is get_change_extent() that finds extent of changes, as a pair
> containing the offset at which the changes start, and the negative offset
> at which the changes end.  So it is the same solution you use, only
> without ignoring whitespace-only prefixes and suffixes...  This code can
> be easily ported to Perl, BTW.
> 
> The markup_intraline_changes() function compares lines from preimage and
> from postimage pairwise, requiring that number of lines matches, the same
> like in your algorithm.
> 

So using Jeff's diff-highlight we remain quite consistent with Trac
output. There's nothing we can "steal" from it. 

> > > > I would also consider ignoring prefixes/suffixes with punctuation, like:
> > > > 
> > > > 	- * I like you.
> > > > 	+ * Alice had a little lamb.
> > > 
> > > But this patch doesn't implement this feature yet, isn't it?
> > 
> > No, but is a matter of adding
> > 
> > 	-$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> > 	+$prefix_is_space = 0 if ($r[$prefix] !~ /\s|[[:punct:]]/);
> > 
> > (and the same for suffix)
> 
> All right.  But it is better added as separate patch.

Sure 

> Perhaps even
> requiring that not only there is at least one of common prefix or common
> suffix, but at least one of them is not whitespace only could be put
> in a separate commit...
> 
> > > > > > +sub format_rem_add_line {
> > > > > > +	my ($rem, $add) = @_;
> > > > > > +	my @r = split(//, $rem);
> > > > > > +	my @a = split(//, $add);
> > > 
> > > BTW the name of variable can be just @add and @rem.
> > >
> > 
> > I know they are different scopes but I don't like it. It makes the code
> > more confusing IMO. But I won't insist.
> 
> In my opinion if the variable refers to the same entity in different
> forms, using @foo and %foo (used in gitweb), or $foo and @foo (could be
> used here) is all right, and even better than trying to come up with
> different name for the same thing because of sigil.

OK. I'll follow that convention then.

>  
> > > > > Shouldn't
> > > > > $prefix / $prefix_len start from 0, not from 1?
> > > > 
> > > > It starts from 1 because it skips first +/-. It should become obvious
> > > > after reading the comment from last patch :).
> 
> This means that $prefix is true even if prefix is empty ($prefix == 1).
> Wouldn't it be better for $prefix_len to count length of true prefix,
> without diff adornment?  Or make @r / @rem skip initial characters...
> 
> > > > +	# In combined diff we must ignore two +/- characters.
> > > > +	$prefix = 2 if ($is_combined);
> > > 
> > > Anyway comment about that fact would be nice.
> > 
> > Will do.
> 
> BTW. it is not "2" but "scalar @{$co{'parents'}}".
> 

OK.

> 
> > > > The splitting and comparing by characters is taken from diff-highlight.
> > > > I don't think it's worth changing here.
> > > 
> > > You are right.
> > > 
> > > I'll try to come with hacky algorithm using string bitwise xor and regexp,
> > > and benchmark it comparing to your C-like solution, but it can be left for
> > > later (simple is better than clever, usually).
> > 
> > If you have time :).
> 
> Anyway it would be separate commit.  Better to just copy tested code
> from contrib/diff-highlight
> 
> BTW. would "git blame -C -C -C -w" detect this correctly as code
> movement^W copying?
>  

Cannot say. Have you considered what I wrote in a separate e-mail,
about using diff-highlight output directly / as a library? 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html