On Sat, Feb 20, 2010 at 02:38:29PM +0100, Thomas Rast wrote: > BTW, here's a weird data point: > > $ ls -l a b > -rw-r--r-- 1 thomas users 3300765 2010-02-20 12:48 a > -rw-r--r-- 1 thomas users 3253762 2010-02-20 12:48 b > $ time diff -u a b | wc -l > 54530 > > real 0m0.644s > user 0m0.562s > sys 0m0.044s > $ time git diff --no-index a b >/dev/null > > real 0m22.848s > user 0m21.956s > sys 0m0.137s > $ time git diff --no-index --patience a b >/dev/null > > real 0m19.508s > user 0m18.673s > sys 0m0.273s I was able to reproduce here. According to 'perf', git spends 70% of its time in xdl_split, which says: /* * See "An O(ND) Difference Algorithm and its Variations", by Eugene Myers. * Basically considers a "box" (off1, off2, lim1, lim2) and scan from both * the forward diagonal starting from (off1, off2) and the backward diagonal * starting from (lim1, lim2). If the K values on the same diagonal crosses * returns the furthest point of reach. We might end up having to expensive * cases using this algorithm is full, so a little bit of heuristic is needed * to cut the search and to return a suboptimal point. */ GNU diff also seems to use the same basic Myers algorithm, but says: The basic algorithm is described in: "An O(ND) Difference Algorithm and its Variations", Eugene Myers, Algorithmica Vol. 1 No. 2, 1986, pp. 251-266; see especially section 4.2, which describes the variation used below. The basic algorithm was independently discovered as described in: "Algorithms for Approximate String Matching", E. Ukkonen, Information and Control Vol. 64, 1985, pp. 100-118. Unless the 'find_minimal' flag is set, this code uses the TOO_EXPENSIVE heuristic, by Paul Eggert, to limit the cost to O(N**1.5 log N) at the price of producing suboptimal output for large inputs with many differences. So it may be that TOO_EXPENSIVE heuristic. I don't know enough about the diff code (git's or otherwise) to say how difficult it would be to adapt GNU diff's tricks. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html