Re: [PATCH 4/6] introduce a commit metapack

Duy Nguyen <pclouds@xxxxxxxxx> · Sat, 2 Feb 2013 16:49:17 +0700

On Fri, Feb 1, 2013 at 5:15 PM, Jeff King <peff@xxxxxxxx> wrote:
> The short-sha1 is a clever idea. Looks like it saves us on the order of
> 4MB for linux-2.6 (versus the full 20-byte sha1). Not as big as the
> savings we get from dropping the other 3 sha1's to uint32_t, but still
> not bad.

We could save another 4 bytes per commit by using 3 bytes for storing
.idx offsets. linux-2.6 only has 3M objects. It'll take many years for
big projects to reach 16M objects and need the fourth byte in
uint32_t.

> I guess the next steps in iterating on this would be:
>
>   1. splitting out the refactoring here into separate patches
>
>   2. squashing the cleaned-up bits into my patch 4/6
>
>   3. deciding whether this should go into a separate file or as part of
>      index v3. Your offsets depend on the .idx file having a sorted sha1
>      list. That is not likely to change, but it would still be nice to
>      make sure they cannot get out of sync. I'm still curious what the
>      performance impact is for mmap-ing N versus N+8MB.

4. Print some cache statistics in "count-objects -v"

>> The length of SHA-1 is chosen to be able to unambiguously identify any
>> cached commits. Full SHA-1 check is done after to catch false
>> positives.
>
> Just to be clear, these false positives come because the abbreviation is
> unambiguous within the packfile, but we might be looking for a commit
> that is not even in our pack, right?

It may even be ambiguous within the pack, say an octopus (i.e. not
cached) commit that shares the same sha-1 prefix with one of the
cached commits.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html