A few issues I have just noticed about this series. First, about naming. "Highlighting interesting parts of diff" is acceptable name, but "syntax highlighting for diff" is not: gitweb already use syntax highlighting in diff views. Call it "diff refinement highlighting", "highlighting changes / changed sections", "intraline highlighting". Second, I think (but I am not sure) that there is a bug in code that finds common suffix and prefix. If I understand correctly the idea is to highlight changed part if there is at least one of common non-whitespace suffix or prefix. So syntax highlighting should look like this: 1. Both prefix and suffix are non empty and non whitespace only -foo -{bar} baz +foo +{quux} baz 2. Non empty and non whitespace only prefix -foo -{bar} +foo +{quux} 2. Non empty and non whitespace only suffix --{bar} baz ++{quux} baz But in your code $prefix is not the length of common prefix, but the position of end of prefix in the original line of diff. So you start with $prefix = 1... even though the prefix is empty. How is that supposed to work? On Mon, 13 Feb 2012, Michał Kiedrowicz wrote: > Jakub Narebski <jnareb@xxxxxxxxx> wrote: > > On Mon, 13 Feb 2012, Michal Kiedrowicz wrote: > > > I haven't found *examples* on GitHub and Trac sites, but what about > > > these ones: > > > > > > https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795 > > > http://trac.edgewall.org/changeset/10973 [...] > > BTW GitHub is closed source, but we can check what algorithm does Trac > > use for diff refinement highlighting (highlighting changed portions of > > diff). > > > > I think it's > http://trac.edgewall.org/browser/trunk/trac/versioncontrol/diff.py > (see markup intraline_changes()). It is get_change_extent() that finds extent of changes, as a pair containing the offset at which the changes start, and the negative offset at which the changes end. So it is the same solution you use, only without ignoring whitespace-only prefixes and suffixes... This code can be easily ported to Perl, BTW. The markup_intraline_changes() function compares lines from preimage and from postimage pairwise, requiring that number of lines matches, the same like in your algorithm. [...] > > I wonder if we can use --diff-words for diff refinement highlighting, > > i.e. LCS on words. > > I think we can try it, but I worry about performance of running `git > diff` on every diff chunk. I was thinking about one single additional run of git-diff-tree with `--diff-words`, not one per chunk. Or perhaps even put it together in one git-diff-tree invocation, just like 'commitdiff' action / git_commitdiff() subroutine uses single git-diff-tree invocation, with the option "--patch-with-raw", to generate both raw diff for difftree and patchset. > > Anyway Jeff's approach is a bit limited, in that it would work only for > > change that does not involve adding newlines, for example splitting > > overly long line when changing something. > > > > See for example line 1786 (in pre-image) in http://trac.edgewall.org/changeset/10973 > > Yes, I'm aware of that. I was thinking about improving it later ("Let's > start with a simple refinment highlightning and maybe later add more > sophisticated algorithms"). Right. [...] > > BTW. is it "at least one of prefix or suffix are non-empty" or "both prefix > > and suffix are non-empty"? > > > > At least one. See: > > -a = 42; > +b = 42; > > Here prefix is empty but suffix is not. Nb. prefix is empty but $prefix == 1, and is boolean true. > > > I would also consider ignoring prefixes/suffixes with punctuation, like: > > > > > > - * I like you. > > > + * Alice had a little lamb. > > > > But this patch doesn't implement this feature yet, isn't it? > > No, but is a matter of adding > > -$prefix_is_space = 0 if ($r[$prefix] !~ /\s/); > +$prefix_is_space = 0 if ($r[$prefix] !~ /\s|[[:punct:]]/); > > (and the same for suffix) All right. But it is better added as separate patch. Perhaps even requiring that not only there is at least one of common prefix or common suffix, but at least one of them is not whitespace only could be put in a separate commit... > > > > > +sub format_rem_add_line { > > > > > + my ($rem, $add) = @_; > > > > > + my @r = split(//, $rem); > > > > > + my @a = split(//, $add); > > > > BTW the name of variable can be just @add and @rem. > > > > I know they are different scopes but I don't like it. It makes the code > more confusing IMO. But I won't insist. In my opinion if the variable refers to the same entity in different forms, using @foo and %foo (used in gitweb), or $foo and @foo (could be used here) is all right, and even better than trying to come up with different name for the same thing because of sigil. > > > > Shouldn't > > > > $prefix / $prefix_len start from 0, not from 1? > > > > > > It starts from 1 because it skips first +/-. It should become obvious > > > after reading the comment from last patch :). This means that $prefix is true even if prefix is empty ($prefix == 1). Wouldn't it be better for $prefix_len to count length of true prefix, without diff adornment? Or make @r / @rem skip initial characters... > > > + # In combined diff we must ignore two +/- characters. > > > + $prefix = 2 if ($is_combined); > > > > Anyway comment about that fact would be nice. > > Will do. BTW. it is not "2" but "scalar @{$co{'parents'}}". > > > The splitting and comparing by characters is taken from diff-highlight. > > > I don't think it's worth changing here. > > > > You are right. > > > > I'll try to come with hacky algorithm using string bitwise xor and regexp, > > and benchmark it comparing to your C-like solution, but it can be left for > > later (simple is better than clever, usually). > > If you have time :). Anyway it would be separate commit. Better to just copy tested code from contrib/diff-highlight BTW. would "git blame -C -C -C -w" detect this correctly as code movement^W copying? -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html