Re: [RFC] Packing large repositories

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mar 31, 2007, at 20:36, Linus Torvalds wrote:

So here's a suggestion:

- start finding a range using the 256-entry fan-out, exactly the way we
   did for the binary search. It's cheap. We could probably avoid EVEN
THIS, and just do one more newton-raphson iteration more. But since we have the data, we migth as well use it. After all, it really *is* just a first approximation of newton-raphson, and while it uses only 8 bits
   (and we could do better), at least it's an *exact* one.

- use newton-raphson to iterate closer. It should be a much faster way to
   find the rough area for the entry we're searching for than binary
   search. Two or three iterations should get us there, easily.

 - do a linear search once you're close enough.

Actually, I had implemented this first, using two newton-raphson
iterations and then binary search. With just one iteration is
too little, and one iteration+binary search often is no win.
Two iterations followed by binary search cuts the nr of steps in
half for the Linux kernel. Two iterations followed by linear search
is often worse, because of "unlucky" cases that end up doing many
probes. Still, during the 5-8 probes in moderately large repositories
(1M objects), each probe pretty much requires its own cache line:
very cache unfriendly.

By using a fan-out table with an average nr of entries per bin
of 32 or so, you get a total cache footprint of between 1 and 2 bits
per entry for the random lookup. So, a complete lookup only hits
about 3 cache lines.  After a few lookups, almost all of the
fan-out table will be in cache, while currently, the cache is
filled with useless 24 byte SHA1+offset of unrelated objects.

  -Geert

PS. I'm travelling now, so can't participate in the discussion
    or send code as much as I'd like and mail may be late or
    out of sequence.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]