On Fri, 24 Jul 2009, Johannes Schindelin wrote: > > Think about it, absent any further information than "it is a hash, i.e. > distributed pretty equally in _any_ byte", even subsets of a sorted list > will me more or less linear. And assuming that they are linear is _still_ > your best bet. The even distribution of the lower-order bytes is irrelevant. We're looking at the top 20-ish bits for a pack with a million-ish objects. The more you zoom in the less linear a sorted list of hashes will be, so assuming linearity at all scales is wrong. It's a bit like fractal mountains. > There is no way to achieve [O(1) seeks], best thing you can hope for is > _expected_ O(1) (e.g. with a hashmap, with exponential worst case). Of course it's expected. However the worst case is nowhere near exponential: it's linear because the second-order search is a linear pagewise scan. But I think in practice, the larger the pack the more that the randomization of the hash function will smooth out performance oddities. (Sorry, I don't know enough statistics to be able to say what the expected error of the linear interpolation is, though I expect it's a fairly simple formula.) For small packs the number of seeks is 1 anyway. Tony. -- f.anthony.n.finch <dot@xxxxxxxx> http://dotat.at/ GERMAN BIGHT HUMBER: SOUTHWEST 5 TO 7. MODERATE OR ROUGH. SQUALLY SHOWERS. MODERATE OR GOOD. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html