On Tue, May 07 2019, Jeff King wrote: > On Mon, May 06, 2019 at 11:44:06AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> Maybe there's some case I haven't thought of that makes this stupid, but >> I wonder if something like a "gc quarantine" might be a fix fo both of >> the the issues you noted above. >> >> I.e. it seems to me that the main issue is that we conflate "mtime 2 >> weeks old because it's unreferenced for 2 weeks" v.s. "mtime 2 weeks old >> because we haven't gotten around to a 'gc'". >> >> So in such a "gc quarantine" mode when we discover an object/pack that's >> unreachable/purely made up of unreachable objects we'd move the relevant >> loose object/"loose" pack to such a quarantine, which would just be >> .git/unreferenced-objects/{??,pack}/ or whatever. >> >> AFAICT both cases you mentioned above would be mitigated by this because >> we'd no longer conflate "haven't gc'd this yet and it's 2 weeks old" >> v.s. "hasn't been referenced in 2 weeks". > > Michael Haggerty and I have (off-list) discussed variations on that, but > it opens up a lot of new issues. Moving something into quarantine isn't > atomic. So you've still corrupted the repo, but now it's recoverable by > reaching into the quarantine. Who notices that the repo is corrupt, and > how? When do we expire objects from quarantine? > > I think the heart of the issue is really the lack of atomicity in the > operations. You need some way to mark "I am using this now" in a way > that cannot race with "looks like nobody is using this, so I'll delete > it". > > And ideally without traversing large bits of the graph on the writing > side, and without requiring any stop-the-world locks during pruning. I was thinking (but realize now that I didn't articulate) that the "gc quarantine" would be another "alternate" implementing a copy-on-write "lockless delete-but-be-able-to-rollback scheme" as you put it. So "gc" would decide (racily) what's unreachable, but instead of unlink()-ing it would "mv" the loose object/pack into the "unreferenced-objects" quarantine. Then in your example #1 "wants to reference ABCD. It sees that we have it." would race on the "other side". I.e. maybe ABCD was *just* moved to the quarantine. But in that case we'd move it back, which would bump the mtime and thus make it ineligible for expiry. Similarly for example #2, the "ABCD is ancient" would be moved, but then promptely moved back on the next GC as we notice ABCD has been re-referenced. Maybe it's just the same problem all over again, but I don't see how yet. Aside from that, I have a hunch that while it's theoretically true that you can at any time re-reference some loose blob/tree/commit again, that the likelyhood of that in practice goes down as it ages, since a user is likely to e.g. re-push or rename some branch they pushed last week, not last year. Hence the mention of creating "unreferenced packs" with some new --keep-unreachable mode. Since we'd pack those together they wouldn't create the "ref explosion" problem we have with the loose refs, and thus you could afford to keep them longer (even though the deltas would be shittier). Whereas now you either need --keep-unreachable (keep stuff forever) or a more aggressive gc.pruneExpire if you'd like to not end up with a ginormous amount of loose objects. >> I started looking at this initially because I was wondering if the >> --keep-unreachable mode you modified in e26a8c4721 ("repack: extend >> --keep-unreachable to loose objects", 2016-06-13) could be made to write >> out such "unreferenced" objects into their *own* pack, so we could >> delete them all at once as a batch, and wouldn't create the "ref >> explosions" mentioned in [1]. >> >> But of course without an accompanying quarantine described above doing >> that would just make this race condition worse. > > I'm not sure it really makes it worse. The pack would have the same > mtime as the loose objects would.