On Tue, Jan 1, 2013 at 1:06 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: >> 3. Dropping the "commits" file and just using the pack-*.idx as the >> index. The problem is that it is sparse in the commit space. So >> just naively storing 40 bytes per entry is going to waste a lot of >> space. If we had a separate index as in (1) above, that could be >> dropped to (say) 4 bytes of offset per object. But still, right now >> the commits file for linux-2.6 is about 7.2M (20 bytes times ~376K >> commits). There are almost 3 million total objects, so even storing >> 4 bytes per object is going to be worse. > > Fix pack-objects to behave the way JGit does, cluster commits first in > the pack stream. Now you have a dense space of commits. If I remember > right this has a tiny positive improvement for most rev-list > operations with very little downside. I was going to suggest a similar thing. The current state of C Git's pack writing is not bad. We mix commits and tags together, but tags are few usually. Once we get the upper and lower bound, in terms of object position in the pack, of the commit+tag region, we could reduce the waste significantly. That is if you sort the cache by the object order in the pack. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html