Re: [PATCH 1/3] Lazily open pack index files on demand

linux@xxxxxxxxxxx · 28 May 2007 20:09:49 -0400

> First off, when comparing O(logn) to O(1), with "n" being less than a 
> billion, they are pretty much exactly the same. Think of it this way: 
> O(logn) == O(9) == O(1), if you know that n < 10**9.

Well, binary searches mean binary logarithms, so O(log n) = O(30).
Still, pretty low.

> Secondly, the cost of Newton isn't "almost O(1)". I don't know _what_ it 
> is (the rule of thumb with Newton-Raphson should be that the number of 
> significant correct digits in the answer doubles with each iteration: I 
> think that probably means that it should approximate O(loglog(n)), but I 
> haven't thought deeply about it.

Excellent intuition!  The algorithm is most commonly known in the computer
science literature as "interpolation search" and it does indeed take O(log
log n) time for uniformly distributed data, which is a good assumption
for SHA-1 hashes.

Of course, for n = 10**9, log(n) is 30 and log log n is 5.
More to the point, for n = 10**6, log(n) is 20 and log(log(n)) is still 5.

Even losing a constant factor of 2, it still seems like it might offer a
factor-of-2 speedup for large repositories.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html