Taylor Blau <me@xxxxxxxxxxxx> writes: > We don't usually do pruning GC's except during manual intervention or > upon request through a support ticket. But when we do it is often > infeasible to lock the entire network's push traffic and reference > updates. So it is not an unheard of event to encounter the race that I > described above. > > The idea is that, at least for non-sensitive pruning, we would move the > pruned objects to a separate repository and hold them there until we > could run `git fsck` on the repository after pruning and verify that the > repository is intact. If it is, then the expired.git repository can be > emptied, too, permanently removing the pruned objects. If not, the > expired.git repository then becomes a donor for the missing objects, > which are used to heal the corrupt main repository. Once *that* is done, > and fsck comes back clean, then the expired.git repository can be > removed. Thanks for the information. Presumably, during such pruning GCs (because manual intervention was needed or because of a support ticket), you would use a cruft expiration of "now", not something like "14 days ago". If so, more on this specific case later... > > I think there is at least one more alternative that should be > > considered, though: since the cruft pack is unlikely to have its objects > > "resurrected" (since the reason why they're there is because they are > > unreachable), it is likely that the objects that are pruned are exactly > > the same as those in the craft pack. So it would be more efficient to > > just unconditionally rename the cruft pack to the backup destination. > > This isn't quite right. The contents that are written into the > expired.git repository is everything that *didn't* end up in the cruft > pack. I reread what I wrote and realized that I didn't give a good description of what I was thinking. So hopefully this is a better one: I was thinking of the case in which GC is regularly run on a repo, say, every 14 days with a 14-day expiration time. So on the first run you would have: a1.pack a2.pack (+ .mtime) Reachable Unreachable (cruft pack) and on the second run, assuming this patch set is in effect: b1.pack b2.pack (+ .mtime) expired.git/b3.pack Reachable Unreachable (cruft pack) In expired.git and my idea was that it is very likely that a2.pack and expired.git/b3.pack are the same, so I thought that a feature in which cruft packs could be moved instead of deleted would be more efficient. Going back to the specific case at the start of this email. I now see that my idea wouldn't work in that specific case, because you would want to generate expired.git packs from a repository that is unlikely to have cruft packs (or, at least, it is unlikely that the existing cruft packs match exactly what you would write in expired.git). If it is likely that cruft pack(s) would only be written in one place (whether in the same repository or in expired.git, but not both), another design option would be to be able to tell Git where to write cruft packs, but not give Git the ability to write cruft packs both to the same repository and expired.git. (Unless in your use case, Taylor, you envision that you would occasionally need to write cruft packs in both places.) That would simplify the UI and the code a bit, but not by much (this patch set, as written, is already elegantly written), so I don't feel too strongly about it and I would be happy with the current pattern.