Re: [PATCH 4/6] introduce a commit metapack

Jeff King <peff@xxxxxxxx> · Mon, 18 Mar 2013 08:20:11 -0400

On Sun, Mar 17, 2013 at 08:21:13PM +0700, Nguyen Thai Ngoc Duy wrote:

> On Thu, Jan 31, 2013 at 6:06 PM, Duy Nguyen <pclouds@xxxxxxxxx> wrote:
> > On Wed, Jan 30, 2013 at 09:16:29PM +0700, Duy Nguyen wrote:
> >> Perhaps we could store abbrev sha-1 instead of full sha-1. Nice
> >> space/time trade-off.
> >
> > Following the on-disk format experiment yesterday, I changed the
> > format to:
> >
> >  - a list a _short_ SHA-1 of cached commits
> > ..
> >
> > The length of SHA-1 is chosen to be able to unambiguously identify any
> > cached commits. Full SHA-1 check is done after to catch false
> > positives. For linux-2.6, SHA-1 length is 6 bytes, git and many
> > moderate-sized projects are 4 bytes.
> 
> And if we are going to create index v3, the same trick could be used
> for the sha-1 table in the index. We use the short sha-1 table for
> binary search and put the rest of sha-1 in a following table (just
> like file offset table). The advantage is a denser search space, about
> 1/4-1/3 the size of full sha-1 table.

You can make it even smaller at some (potential) run-time cost.

Keep in mind you are just repeating information that is in the full sha1
list in the index. So you could store a fixed-size offset into that list
(e.g., 32-bit), and then instead of comparing sha1s during a binary
search, you would dereference the offset to the real sha1s and compare
those.

The run-time cost is not any worse in a big-O sense, but your cache
locality is much worse (you hit a second random page for each sha1
comparison), which might be noticeable. You'd have to benchmark to see
how big an impact.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html