Re: [RFC] pack-objects: compression level for non-blobs

Duy Nguyen <pclouds@xxxxxxxxx> · Tue, 1 Jan 2013 11:15:58 +0700



On Tue, Jan 1, 2013 at 1:06 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>>   3. Dropping the "commits" file and just using the pack-*.idx as the
>>      index. The problem is that it is sparse in the commit space. So
>>      just naively storing 40 bytes per entry is going to waste a lot of
>>      space. If we had a separate index as in (1) above, that could be
>>      dropped to (say) 4 bytes of offset per object. But still, right now
>>      the commits file for linux-2.6 is about 7.2M (20 bytes times ~376K
>>      commits). There are almost 3 million total objects, so even storing
>>      4 bytes per object is going to be worse.
>
> Fix pack-objects to behave the way JGit does, cluster commits first in
> the pack stream. Now you have a dense space of commits. If I remember
> right this has a tiny positive improvement for most rev-list
> operations with very little downside.

I was going to suggest a similar thing. The current state of C Git's
pack writing is not bad. We mix commits and tags together, but tags
are few usually. Once we get the upper and lower bound, in terms of
object position in the pack, of the commit+tag region, we could reduce
the waste significantly. That is if you sort the cache by the object
order in the pack.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html