On Mon, 28 May 2007, linux@xxxxxxxxxxx wrote: > > Even losing a constant factor of 2, it still seems like it might offer a > factor-of-2 speedup for large repositories. Well, not so much according to the numbers I had. Yes, SHA-1's are very nicely distributed on a larger scale, but on a _small_ scale they aren't. So you end up being able to get very good initial guesses for the first few iterations, but once you get "close enough", you can't do anything at all, and then you're actually worse off. Also, please do realize that the binary search is actually a *smart* binary search, which does a radix-256 fan-out at the beginning, getting rid of the first 8 levels of searching for free. The Newton-Raphson thing can do that too (and in my trial patch, did), but it means that you actually get rid of just the initial guess (the one that worked the best!) and you still have the problem that once you get close enough, the distribution at a local level is not at all nice and linear. So what I did with my patch (you can find it in the archives - search for Newton-Raphson, I'd say about 3 months ago) was to do three cycles of approximating, and then a linear search. The linear search has good cache behaviour, so it's not as horrid as it might sound, but for some numbers I did, my approximate thing would hit on the exact first entry about half the time, but would have to walk up to ~20 entries occasionally. Which meant that the binary search (with the initial radix-256 fan-out) actually mostly outperformed the Newton-Raphson thing. Also, object lookup is certainly noticeable, but the biggest cost of it is the cache misses, and even then it's not really _that_ noticeable. And it's really neither slow nor stupid as it is. So I'd love to see somebody try to do a _proper_ apprixmation of Newton- Raphson, not the quick hack I did. But I think factor-of-two is actually optimistic, even for pretty large repos. And it would have to be no worse than what we have now for average-sized ones. (And object lookup time is generally not the dominant cost, so while it's noticeable, cutting it down by even a whole 50% isn't going to speed up any normal git operations by more than a couple of percent.. Generating diffs in particular is a much more costly op for things like "git blame") Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html