Hi, On Fri, 2 Jan 2009, Linus Torvalds wrote: > On Fri, 2 Jan 2009, Clemens Buchacher wrote: > > > > Only two choices, and I still get it wrong. The diffs should be > > labeled the other way around, of course. > > Yes, this one is a real patience diff change, but it's also the same one > that I've seen in the google fanboi findings. What google did _not_ show > was any real-life examples, or anybody doing any critical analysis. FWIW it's the test case in the commit introducing the --patience option. > So I was hoping for something else than a single "in this case patience > diff works really well". I was hoping to see what it does in real life. I will dig out a real-world example where I _know_ patience diff would have helped. (I remember that I rewrote a pretty large diff which was sent on this list, only to understand what it actually does, and I am pretty certain this is a good real-world showcase.) But yes, I agree, the thing does not matter _all_ that much in reality. The case where I expect a real improvement is when you modify a function and insert a function just before it, and Myers' algorithm matches mainly empty lines and lines ending in curly brackets. In other words: something I tried to tackle with ee95ec5d(xdl_merge(): introduce XDL_MERGE_ZEALOUS_ALNUM) for merges. The typical look of such a diff is something like this: -<... some function header ...> +<... a completely different function header ...> { - <... variables ...> + <... other variables ...> for (i = 0; i < 10; i++) { - <... some code ...> + <... some code doing something completely different ...> } return 0; } +<... the function header which was removed earlier ...> +{ <... refactored _and also reindented_ code ...> > And I haven't seen _any_ real critical analysis of it. Anywhere. Neither have I. Let alone something close to documentation. For example, when the "patience diff algorithm" is explained, it looks more like a longest common sequence algorithm when the input is already sorted in the first item. Further, there is no rigorous analysis of the runtime (I figured that the original runtime is O(nm) where "n" is the number of lines and "m" is the length of the maximal ordered sequence of common unique lines, and my implementation can only improve that to O(n log(m))). This could be improved, I think, for the most common case where you have pretty long common _continuous_ sequences of unique lines, i.e. large ranges of lines that are identical. The runtime is especially horrible in the light of the runtime of Myers' algorithm, which uses O(n d), where d is the edit distance, i.e. the number of lines added + number of lines removed. (Note: in the real world, there are substantial speed ups for consecutive edits, i.e. line ranges where there are no common lines at all.) Also, I am less than thrilled by the job the fanbois did coming up with "convincing" evidence: exactly as you pointed out, there are _no_ real-world examples where it helps. And the worst part: one can only _guess_ what motivated patience diff. I imagine it came from the observation that function headers are unique, and that you usually want to preserve as much context around them. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html