Re: [PATCH 0/3] Teach Git about the patience diff algorithm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Fri, 2 Jan 2009, Linus Torvalds wrote:

> On Fri, 2 Jan 2009, Johannes Schindelin wrote:
> 
> > And the worst part: one can only _guess_ what motivated patience diff.  
> > I imagine it came from the observation that function headers are 
> > unique, and that you usually want to preserve as much context around 
> > them.
> 
> Well, I do like the notion of giving more weight to unique lines - I think 
> it makes sense. That said, I suspect it would make almost as much sense to 
> give more weight simply to _longer_ lines, and I suspect the standard 
> Myers' algorithm could possibly be simply extended to take line size into 
> account when calculating the weights.

I think that it makes more sense with the common unique lines, because 
they are _unambiguously_ common.  That is, if possible, we would like to 
keep them as common lines.

BTW this also opens the door for more automatic code movement detection.

As for the longer lines: what exactly would you want to put "weight" on?  
The edit distance is the number of plusses and minusses, and this is the 
thing that actually is pretty critical for the performance: the larger the 
distance, the longer it takes.  So if you want to put a different "weight" 
on a line, i.e. something else than a "1", you are risking a substantially 
worse performance.

And I am still not convinced that longer lines should get more weight. A 
line starting with "exit:" can be much more important than 8 tabs followed 
by a curly bracket.

> Because the problem with diffs for C doesn't really tend to be as much 
> about non-unique lines as about just _trivial_ lines that are mostly empty 
> or contain just braces etc. Those are quite arguably almost totally 
> worthless for equality testing.

Oh, but then we get very C specific.  Let's not go there.

> And btw, don't get me wrong - I don't think there is anything wrong with 
> the patience diff. I think it's a worthy thing to try, and I'm not at 
> all arguing against it.

I never took you to be opposed.  I myself mainly implemented it because I 
wanted to play with it, and have something more useful than what I found 
in the whole wide web.

> However, I do think that the people arguing for it often do so based on 
> less-than-very-logical arguments, and it's entirely possible that other 
> approaches are better (eg the "weight by size" thing rather than "weight 
> by uniqueness").

Actually, I think it is very possible that merging hunks if there are not 
enough alnums between them could make a whole lot of sense.

But IIRC it was shot down and replaced by the not-so-specific-to-C 
algorithm to merge hunks when the output is shorter than having separate 
hunks.

> The thing about unique lines is that there are no guarantees at all that 
> they even exist, so a uniqueness-based thing will always have to fall 
> back on anything else. That, to me, implies that the whole notion is 
> somewhat mis-designed: it's clearly not a generic concept.
> 
> (In contrast, taking the length of the matching lines into account would 
> not have that kind of bad special case)

However, we know that humans like to start from the unique features they 
see, and continue from there.

And while it is quite possible that it is the wrong approach (see all that 
security theater at the airport, and some nice rational analyses about the 
cost/benefit equations there), imitating humans is still the thing that 
will convince most humans.

Ciao,
Dscho

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux