Re: [PATCH] rebase -i: avoid --cherry-pick when rebasing to a direct ancestor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 20, 2010 at 02:38:29PM +0100, Thomas Rast wrote:

> BTW, here's a weird data point:
> 
> $ ls -l a b
> -rw-r--r-- 1 thomas users 3300765 2010-02-20 12:48 a
> -rw-r--r-- 1 thomas users 3253762 2010-02-20 12:48 b
> $ time diff -u a b | wc -l
> 54530
> 
> real    0m0.644s
> user    0m0.562s
> sys     0m0.044s
> $ time git diff --no-index a b >/dev/null
> 
> real    0m22.848s
> user    0m21.956s
> sys     0m0.137s
> $ time git diff --no-index --patience a b >/dev/null
> 
> real    0m19.508s
> user    0m18.673s
> sys     0m0.273s

I was able to reproduce here. According to 'perf', git spends 70% of its
time in xdl_split, which says:

/*
 * See "An O(ND) Difference Algorithm and its Variations", by Eugene Myers.
 * Basically considers a "box" (off1, off2, lim1, lim2) and scan from both
 * the forward diagonal starting from (off1, off2) and the backward diagonal
 * starting from (lim1, lim2). If the K values on the same diagonal crosses
 * returns the furthest point of reach. We might end up having to expensive
 * cases using this algorithm is full, so a little bit of heuristic is needed
 * to cut the search and to return a suboptimal point.
 */

GNU diff also seems to use the same basic Myers algorithm, but says:

   The basic algorithm is described in:
   "An O(ND) Difference Algorithm and its Variations", Eugene Myers,
   Algorithmica Vol. 1 No. 2, 1986, pp. 251-266;
   see especially section 4.2, which describes the variation used below.

   The basic algorithm was independently discovered as described in:
   "Algorithms for Approximate String Matching", E. Ukkonen,
   Information and Control Vol. 64, 1985, pp. 100-118.

   Unless the 'find_minimal' flag is set, this code uses the TOO_EXPENSIVE
   heuristic, by Paul Eggert, to limit the cost to O(N**1.5 log N)
   at the price of producing suboptimal output for large inputs with
   many differences.

So it may be that TOO_EXPENSIVE heuristic. I don't know enough about the
diff code (git's or otherwise) to say how difficult it would be to adapt
GNU diff's tricks.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]