Re: [RFC] Packing large repositories

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Tue, 3 Apr 2007 01:39:59 -0400

Geert Bosch <bosch@xxxxxxxxxxx> wrote:
> Actually, I had implemented this first, using two newton-raphson
> iterations and then binary search. With just one iteration is
> too little, and one iteration+binary search often is no win.
> Two iterations followed by binary search cuts the nr of steps in
> half for the Linux kernel. Two iterations followed by linear search
> is often worse, because of "unlucky" cases that end up doing many
> probes. Still, during the 5-8 probes in moderately large repositories
> (1M objects), each probe pretty much requires its own cache line:
> very cache unfriendly.

If Nico and I can ever find the time to get our ideas for pack v4
coded into something executable, I think you will find this is less
of an issue than you think.

We're hoping to change enough of the commit and tree traversal
code that the "tight" loops around chasing tree, parent, and blob
pointers can be done using strictly pack offsets and completely
avoid these SHA-1 lookups.  Thus the only time we'd fall into the
above-mentioned SHA-1 lookup path is on initial entry to a revision
walk, or when spanning to another packfile.  This would mean most
workloads should only hit that code once per command line argument.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html