Jakub Narebski <jnareb@xxxxxxxxx> wrote: > On Mon, 13 Feb 2012, Michal Kiedrowicz wrote: > > Jakub Narebski <jnareb@xxxxxxxxx> wrote: > > > Michał Kiedrowicz <michal.kiedrowicz@xxxxxxxxx> writes: > > I haven't found *examples* on GitHub and Trac sites, but what about > > these ones: > > > > https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795 > > http://trac.edgewall.org/changeset/10973 > > Thanks. That is what I meant by "good examples". Perhaps they should > be put in the commit message? OK > > BTW GitHub is closed source, but we can check what algorithm does Trac > use for diff refinement highlighting (highlighting changed portions of > diff). > I think it's http://trac.edgewall.org/browser/trunk/trac/versioncontrol/diff.py (see markup intraline_changes()). > > > It doesn't implement LCS / diff > > > algorithm like e.g. tkdiff does for its diff refinement highlighting, > > > isn't it? > > > > I doesn't. I share the Jeff's opinion that: > > a) Jeff's approach is "good enough" > > b) LCS on bytes could be very confusing if it marked few sets of > > characters. > > I wonder if we can use --diff-words for diff refinement highlighting, > i.e. LCS on words. I think we can try it, but I worry about performance of running `git diff` on every diff chunk. > > Anyway Jeff's approach is a bit limited, in that it would work only for > change that does not involve adding newlines, for example splitting > overly long line when changing something. > > See for example line 1786 (in pre-image) in http://trac.edgewall.org/changeset/10973 > Yes, I'm aware of that. I was thinking about improving it later ("Let's start with a simple refinment highlightning and maybe later add more sophisticated algorithms"). > > > By completly different you mean that they do not have common prefix or > > > common suffix (at least one of them), isn't it? > > BTW. is it "at least one of prefix or suffix are non-empty" or "both prefix > and suffix are non-empty"? > At least one. See: -a = 42; +b = 42; Here prefix is empty but suffix is not. > > I would also consider ignoring prefixes/suffixes with punctuation, like: > > > > - * I like you. > > + * Alice had a little lamb. > > But this patch doesn't implement this feature yet, isn't it? No, but is a matter of adding -$prefix_is_space = 0 if ($r[$prefix] !~ /\s/); +$prefix_is_space = 0 if ($r[$prefix] !~ /\s|[[:punct:]]/); (and the same for suffix) > Well, here is another idea: do not highlight if sum of prefix and suffix > lengths are less than some threshold, e.g. 2 characters not including > whitespace, or some percentage with respect to total line length. > That might be a good idea. > > > Eeeeeek! First you split into letters, in caller at that, then join? > > > Why not pass striung ($str suggests string not array of characters), > > > and use substr instead? > > > > > > [Please disregard this and the next paragraph at first reading] > > > > I will rename $str to something more self describing. > > Please do. > > BTW. don't you assume here that both common prefix and common suffix > are non-empty? Yes. The caller makes sure they are correct (at least > > > > +sub format_rem_add_line { > > > > + my ($rem, $add) = @_; > > > > + my @r = split(//, $rem); > > > > + my @a = split(//, $add); > > BTW the name of variable can be just @add and @rem. > I know they are different scopes but I don't like it. It makes the code more confusing IMO. But I won't insist. > > > Shouldn't > > > $prefix / $prefix_len start from 0, not from 1? > > > > It starts from 1 because it skips first +/-. It should become obvious > > after reading the comment from last patch :). > > > > + # In combined diff we must ignore two +/- characters. > > + $prefix = 2 if ($is_combined); > > Anyway comment about that fact would be nice. Will do. > > > > > + my ($prefix_is_space, $suffix_is_space) = (1, 1); > > > > + > > > > + while ($prefix < @r && $prefix < @a) { > > > > + last if ($r[$prefix] ne $a[$prefix]); > > > > + > > > > + $prefix_is_space = 0 if ($r[$prefix] !~ /\s/); > > > > + $prefix++; > > > > + } > > > > > > Ah, I see that it is easier to find common prefix by treating string > > > as array of characters. > > > > > > Though I wonder if it wouldn't be easier to use trick of XOR-ing two > > > strings (see "Bitwise String Operators" in perlop(1)): > > > > > > my $xor = "$rem" ^ "$add"; > > > > > > and finding starting sequence of "\0", which denote common prefix. > > > > > > > > > Though this and the following is a nice implementation of > > > algorithm... as it would be implemented in C. Nevermind, it might be > > > good enough... > > > > The splitting and comparing by characters is taken from diff-highlight. > > I don't think it's worth changing here. > > You are right. > > I'll try to come with hacky algorithm using string bitwise xor and regexp, > and benchmark it comparing to your C-like solution, but it can be left for > later (simple is better than clever, usually). If you have time :). > > > > # HTML-format diff context, removed and added lines. > > > > sub format_ctx_rem_add_lines { > > > > - my ($ctx, $rem, $add) = @_; > > > > + my ($ctx, $rem, $add, $is_combined) = @_; > > > > my (@new_ctx, @new_rem, @new_add); > > > > + my $num_add_lines = @$add; > > > > > > Why is this temporary variable needed? If you are not sure that == > > > operator enforces scalar context on both arguments you can always > > > write > > > > > > scalar @$add == scalar @$rem > > > > You read my mind. > > BTW. the same comment applies to patch adding support for highlighting > changed part in combined diff. > OK -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html