Re: Newton-Raphson, was Re: Performance issue of 'git branch'

Tony Finch <dot@xxxxxxxx> · Fri, 24 Jul 2009 00:50:47 +0100

On Fri, 24 Jul 2009, Johannes Schindelin wrote:
>
> Think about it, absent any further information than "it is a hash, i.e.
> distributed pretty equally in _any_ byte", even subsets of a sorted list
> will me more or less linear.  And assuming that they are linear is _still_
> your best bet.

The even distribution of the lower-order bytes is irrelevant. We're
looking at the top 20-ish bits for a pack with a million-ish objects. The
more you zoom in the less linear a sorted list of hashes will be, so
assuming linearity at all scales is wrong. It's a bit like fractal
mountains.

> There is no way to achieve [O(1) seeks], best thing you can hope for is
> _expected_ O(1) (e.g. with a hashmap, with exponential worst case).

Of course it's expected. However the worst case is nowhere near
exponential: it's linear because the second-order search is a linear
pagewise scan. But I think in practice, the larger the pack the more that
the randomization of the hash function will smooth out performance
oddities. (Sorry, I don't know enough statistics to be able to say what
the expected error of the linear interpolation is, though I expect it's a
fairly simple formula.) For small packs the number of seeks is 1 anyway.

Tony.
-- 
f.anthony.n.finch  <dot@xxxxxxxx>  http://dotat.at/
GERMAN BIGHT HUMBER: SOUTHWEST 5 TO 7. MODERATE OR ROUGH. SQUALLY SHOWERS.
MODERATE OR GOOD.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html