Re: Newton-Raphson, was Re: Performance issue of 'git branch'

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Fri, 24 Jul 2009 02:43:59 +0200 (CEST)

Hi,

On Fri, 24 Jul 2009, Tony Finch wrote:

> On Fri, 24 Jul 2009, Johannes Schindelin wrote:
> >
> > Think about it, absent any further information than "it is a hash, i.e.
> > distributed pretty equally in _any_ byte", even subsets of a sorted list
> > will me more or less linear.  And assuming that they are linear is _still_
> > your best bet.
> 
> The even distribution of the lower-order bytes is irrelevant.

I was not talking about lower-order bytes.  All bytes are pretty much 
evenly distributed.  That's why SHA-1 is a good hash.

> We're looking at the top 20-ish bits for a pack with a million-ish 
> objects. The more you zoom in the less linear a sorted list of hashes 
> will be, so assuming linearity at all scales is wrong. It's a bit like 
> fractal mountains.

If you really find irregularities like that, then SHA-1 is really a lousy 
hash.  Irregularities like this are typically exploitable.

If you know of such an irregularity, you might want to write a paper that 
SHA-1 is broken and get famous.

> > There is no way to achieve [O(1) seeks], best thing you can hope for 
> > is _expected_ O(1) (e.g. with a hashmap, with exponential worst case).
> 
> Of course it's expected. However the worst case is nowhere near
> exponential: it's linear because the second-order search is a linear
> pagewise scan. But I think in practice, the larger the pack the more that
> the randomization of the hash function will smooth out performance
> oddities. (Sorry, I don't know enough statistics to be able to say what
> the expected error of the linear interpolation is, though I expect it's a
> fairly simple formula.) For small packs the number of seeks is 1 anyway.

I will believe it when I see it.

Ciao,
Dscho

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html