On Mon, Oct 1, 2012 at 8:07 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: >> You mentioned this before in your idea mail a while back. I wonder if >> it's worth storing bitmaps for all packs, not just the self contained >> ones. > > Colby and I started talking about this late last week too. It seems > feasible, but does add a bit more complexity to the algorithm used > when enumerating. Yes. Though at server side, if it's too much trouble, the packer can just ignore open packs and use only closed ones. >> We could have one leaf bitmap per pack to mark all leaves where >> we'll need to traverse outside the pack. Commit leaves are the best as >> we can potentially reuse commit bitmaps from other packs. Tree leaves >> will be followed in the normal/slow way. > > Yes, Colby proposed the same idea. > > We cannot make a "leaf bitmap per pack". The leaf SHA-1s are not in > the pack and therefore cannot have a bit assigned to them. We could mark all objects _in_ the pack that lead to an external object. That's what I meant by leaves. We need to parse the leaves to find out actual SHA-1s that are outside the pack. Or we could go with your approach below too. > We could > add a new section that listed the unique leaf SHA-1s in their own > private table, and then assigned per bitmap a leaf bitmap that set to > 1 for any leaf object that is outside of the pack. > One of the problems we have seen with these non-closed packs is they > waste an incredible amount of disk. As an example, do a `git fetch` > from Linus tree when you are more than a few weeks behind. You will > get back more than 100 objects, so the thin pack will be saved and > completed with additional base objects. That thin pack will go from a > few MiBs to more than 40 MiB of data on disk, thanks to the redundant > base objects being appended to the end of the pack. For most uses > these packs are best eliminated and replaced with a new complete > closure pack. The redundant base objects disappear, and Git stops > wasting a huge amount of disk. That's probably a different problem. I appreciate disk savings but I would not want to wait a few more minutes for repack on every git-fetch. But if this bitmap thing makes repack much faster than currently, repacking after every git-fetch may become practical. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html