Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



A few issues I have just noticed about this series.


First, about naming.  "Highlighting interesting parts of diff" is
acceptable name, but "syntax highlighting for diff" is not: gitweb
already use syntax highlighting in diff views.  Call it "diff refinement
highlighting", "highlighting changes / changed sections", "intraline
highlighting".


Second, I think (but I am not sure) that there is a bug in code that
finds common suffix and prefix.

If I understand correctly the idea is to highlight changed part if
there is at least one of common non-whitespace suffix or prefix.
So syntax highlighting should look like this:

1. Both prefix and suffix are non empty and non whitespace only

  -foo -{bar} baz
  +foo +{quux} baz

2. Non empty and non whitespace only prefix

  -foo -{bar}
  +foo +{quux}

2. Non empty and non whitespace only suffix

  --{bar} baz
  ++{quux} baz

But in your code $prefix is not the length of common prefix, but
the position of end of prefix in the original line of diff.  So
you start with $prefix = 1... even though the prefix is empty.

How is that supposed to work?


On Mon, 13 Feb 2012, Michał Kiedrowicz wrote:
> Jakub Narebski <jnareb@xxxxxxxxx> wrote:
> > On Mon, 13 Feb 2012, Michal Kiedrowicz wrote:

> > > I haven't found *examples* on GitHub and Trac sites, but what about
> > > these ones:
> > > 
> > > https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795
> > > http://trac.edgewall.org/changeset/10973
[...]
> > BTW GitHub is closed source, but we can check what algorithm does Trac
> > use for diff refinement highlighting (highlighting changed portions of
> > diff).
> > 
> 
> I think it's
> http://trac.edgewall.org/browser/trunk/trac/versioncontrol/diff.py
> (see markup intraline_changes()).
 
It is get_change_extent() that finds extent of changes, as a pair
containing the offset at which the changes start, and the negative offset
at which the changes end.  So it is the same solution you use, only
without ignoring whitespace-only prefixes and suffixes...  This code can
be easily ported to Perl, BTW.

The markup_intraline_changes() function compares lines from preimage and
from postimage pairwise, requiring that number of lines matches, the same
like in your algorithm.

[...]
> > I wonder if we can use --diff-words for diff refinement highlighting,
> > i.e. LCS on words.
> 
> I think we can try it, but I worry about performance of running `git
> diff` on every diff chunk.

I was thinking about one single additional run of git-diff-tree with
`--diff-words`, not one per chunk.

Or perhaps even put it together in one git-diff-tree invocation, just like
'commitdiff' action / git_commitdiff() subroutine uses single git-diff-tree
invocation, with the option "--patch-with-raw", to generate both raw diff
for difftree and patchset.
 
> > Anyway Jeff's approach is a bit limited, in that it would work only for
> > change that does not involve adding newlines, for example splitting
> > overly long line when changing something.
> > 
> > See for example line 1786 (in pre-image) in http://trac.edgewall.org/changeset/10973
> 
> Yes, I'm aware of that. I was thinking about improving it later ("Let's
> start with a simple refinment highlightning and maybe later add more
> sophisticated algorithms").

Right.
 
[...]
> > BTW. is it "at least one of prefix or suffix are non-empty" or "both prefix
> > and suffix are non-empty"?
> > 
> 
> At least one. See:
> 
> 	-a = 42;
> 	+b = 42;
> 
> Here prefix is empty but suffix is not.

Nb. prefix is empty but $prefix == 1, and is boolean true.

> > > I would also consider ignoring prefixes/suffixes with punctuation, like:
> > > 
> > > 	- * I like you.
> > > 	+ * Alice had a little lamb.
> > 
> > But this patch doesn't implement this feature yet, isn't it?
> 
> No, but is a matter of adding
> 
> 	-$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> 	+$prefix_is_space = 0 if ($r[$prefix] !~ /\s|[[:punct:]]/);
> 
> (and the same for suffix)

All right.  But it is better added as separate patch.  Perhaps even
requiring that not only there is at least one of common prefix or common
suffix, but at least one of them is not whitespace only could be put
in a separate commit...

> > > > > +sub format_rem_add_line {
> > > > > +	my ($rem, $add) = @_;
> > > > > +	my @r = split(//, $rem);
> > > > > +	my @a = split(//, $add);
> > 
> > BTW the name of variable can be just @add and @rem.
> >
> 
> I know they are different scopes but I don't like it. It makes the code
> more confusing IMO. But I won't insist.

In my opinion if the variable refers to the same entity in different
forms, using @foo and %foo (used in gitweb), or $foo and @foo (could be
used here) is all right, and even better than trying to come up with
different name for the same thing because of sigil.
 
> > > > Shouldn't
> > > > $prefix / $prefix_len start from 0, not from 1?
> > > 
> > > It starts from 1 because it skips first +/-. It should become obvious
> > > after reading the comment from last patch :).

This means that $prefix is true even if prefix is empty ($prefix == 1).
Wouldn't it be better for $prefix_len to count length of true prefix,
without diff adornment?  Or make @r / @rem skip initial characters...

> > > +	# In combined diff we must ignore two +/- characters.
> > > +	$prefix = 2 if ($is_combined);
> > 
> > Anyway comment about that fact would be nice.
> 
> Will do.

BTW. it is not "2" but "scalar @{$co{'parents'}}".


> > > The splitting and comparing by characters is taken from diff-highlight.
> > > I don't think it's worth changing here.
> > 
> > You are right.
> > 
> > I'll try to come with hacky algorithm using string bitwise xor and regexp,
> > and benchmark it comparing to your C-like solution, but it can be left for
> > later (simple is better than clever, usually).
> 
> If you have time :).

Anyway it would be separate commit.  Better to just copy tested code
from contrib/diff-highlight

BTW. would "git blame -C -C -C -w" detect this correctly as code
movement^W copying?
 
-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]