On Mon, Jun 11, 2012 at 02:34:14PM -0400, Jeff King wrote: > You _could_ make a separate cruft pack for each pack that you repack. So > if I have A.pack and B.pack, I'd pack all of the reachable objects into > C.pack, and then make D.pack containing the unreachable objects from > A.pack, and E.pack with the unreachable objects from B.pack. And then > set the mtime of the cruft packs to that of their parent packs. > > And then the next time you pack, repacking D and E would probably be a > no-op that preserves mtime, but might create a new pack that ejects some > now-reachable object. > > To implement that, I think your --list-unreachable would just have to > print a list of "<pack-mtime> <sha1>" pairs, and then you would pack > each set with an identical mtime (or even a "close enough" mtime within > some slop).... How about this instead? We distinguish between cruft packs and "real" packs by the filename. So we have "cruft-<SHA1>.{idx,pack}" and "pack-<SHA1>.{idx.pack}". Normally, git will look at any pack in the pack directory that has an .idx and .pack extension, but during repack operation, it will by only look in the pack-* packs first. If it can't find an object there, it will then fall back to trying to fetch the object from the cruft-* packs, and if it finds the object, it copies it into the new pack which is creating, thus "rescueing" an object which reappears during the expiry window. This should be a relatively rare event, and if it happens, the object will be in two packs, a pack-* pack and a cruft-* pack, but that's OK. So since git pack-objects isn't even looking in the cruft-* packs except when it needs to rescue an object, the objects in the cruft-* packs won't get copied, and we won't need to have per-object mtimes. It also means it will go faster since it's not copying the cruft-* packs at all, and possibly not even looking at them. Now all we need to do is delete any cruft-* packs which are older than the expiry window. We don't even need to look at their contents. It does imply that we may accumulate a new cruft-<SHA1> pack each time we run git gc, but users shouldn't be running git gc all that often anyway. And even if they do run it all the time, it will still be more efficient than keeping the unreachable objects as loose objects. - Ted -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html