On Mon, Jul 16, 2018 at 2:21 PM, Jeff King <peff@xxxxxxxx> wrote: > On Mon, Jul 16, 2018 at 02:09:43PM -0700, Elijah Newren wrote: >> The point of gc is to: expire reflogs, repack referenced objects, and >> delete loose objects that (1) are no longer referenced and (2) are >> "old enough". >> >> The "old enough" bit is the special part here. Before the gc, those >> objects were referenced (only by old reflog entries) and were found in >> a pack. git currently writes those objects out to disk and gives them >> the age of the packfile they are contained in, which I think is the >> source of the bug. We have a reference for those objects from the >> reflogs, know they aren't referenced anywhere else, so those objects >> to me are the age of those reflog entries: 90 days. As such, they are >> "old enough" and should be deleted. > > OK, I see what you're saying, but... > >> I never got around to fixing it properly, though, because 'git prune' >> is such a handy workaround that for now. Having people nuke all their >> loose objects is a bit risky, but the only other workaround people had >> was to re-clone (waiting the requisite 20+ minutes for the repo to >> download) and throw away their old clone. (Though some people even >> went that route, IIRC.) > > If we were to delete those objects, wouldn't it be exactly the same as > running "git prune"? Or setting your gc.pruneExpire to "now" or even "5 > minutes"? Or are you concerned with taking other objects along for the > ride that weren't part of old reflogs? I think that's a valid concern, Yes, I was worried about taking other objects along for the ride that weren't part of old reflogs. > but it's also an issue for objects which were previously referenced in > a reflog, but are part of another current operation. I'm not certain what you're referring to here. > Also, what do you do if there weren't reflogs in the repo? Or the > reflogs were deleted (e.g., because branch deletion drops the associated > reflog entirely)? Yes, there are issues this rule won't help with, but in my experience it was a primary (if not sole) actual cause in practice. (I believe I even said elsewhere in this thread that I knew there were unreachable objects for other reasons and they might also become large in number). At $DAYJOB we've had multiple people including myself hit the "too many unreachable loose objects" nasty loop issue (some of us multiple different times), and as far as I can tell, most (perhaps even all) of them would have been avoided by just "properly" deleting garbage as per my object-age-is-reflog-age-if-not-otherwise-referenced rule. >> With big repos, it's easy to get into situations where there are well >> more than 10000 objects satisfying these conditions. In fact, it's >> not all that rare that the repo has far more loose objects after a git >> gc than before. > > Yes, this is definitely a wart and I think is worth addressing. > >> I totally agree with your general plan to put unreferenced loose >> objects into a pack. However, I don't think these objects should be >> part of that pack; they should just be deleted instead. > > I assume by "these objects" you mean ones which used to be reachable > from a reflog, but that reflog entry just expired. I think you'd be > sacrificing some race-safety in that case. Is that inherently any more race unsafe than 'git prune --expire=2.weeks.ago'? I thought it'd be racy in the same ways, and thus a tradeoff folks are already accepting (at least implicitly) when running git-gc. Since these objects are actually 90 days old rather than a mere two weeks, it actually seemed more safe to me. But maybe I'm overlooking something with the pack->loose transition that makes it more racy? > If the objects went into a pack under a race-proof scheme, would that > actually bother you? Is it the 10,000 objects that's a problem, or is it > the horrible I/O from exploding them coupled with the annoying auto-gc > behavior? Yeah, good point. It's mostly the annoying auto-gc behavior and the horrible I/O of future git operations from having lots of loose objects. They've already been paying the cost of storing those objects in packed form for 90 days; a few more won't hurt much. I'd be slightly annoyed knowing that we're storing garbage we don't need to be, but I don't think it's a real issue.