On Wed, Jun 19 2019, Jeff King wrote: > On Wed, Jun 19, 2019 at 08:01:55PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > You could sort of avoid the problem here too with >> > >> > parallel 'git fetch --no-auto-gc {}' ::: $(git remote) >> > git gc --auto >> > >> > It's definitely simpler, but of course we have to manually add >> > --no-auto-gc in everywhere we need, so not quite as elegant. >> > >> > Actually you could already do that with 'git -c gc.auto=false fetch', I guess. >> >> The point of the 'parallel' example is to show disconnected git >> commands, think trying to run 'git' in a terminal while your editor >> asynchronously runs a polling 'fetch', or a server with multiple >> concurrent clients running 'gc --auto'. >> >> That's the question my RFC patch raises. As far as I can tell the >> approach in your patch is only needed because our locking for gc is >> buggy, rather than introduce the caveat that an fetch(N) operation won't >> do "gc" until it's finished (we may have hundreds, thousands of remotes, >> I use that for some more obscure use-cases) shouldn't we just fix the >> locking? > > I think there may be room for both approaches. Yours fixes the repeated > message in the more general case, but Duy's suggestion is the most > efficient thing. > > I agree that the "thousands of remotes" case means we might want to gc > in the interim. But we probably ought to do that deterministically > rather than hoping that the pattern of lock contention makes sense. We do it deterministically, when gc.auto thresholds et al are exceeded we kick one off without waiting for other stuff, if we can get the lock. I don't think this desire to just wait a bit until all the fetches are complete makes sense as a special-case. If, as you noted in <20190619190845.GD28145@xxxxxxxxxxxxxxxxxxxxx>, the desire is to reduce GC CPU use then you're better off just tweaking the limits upwards. Then you get that with everything, like when you run "commit" in a for-loop, not just this one special case of "fetch". We have existing potentially long-running operations like "fetch", "rebase" and "git svn fetch" that run "gc --auto" for their incremental steps, and that's a feature. It keeps "gc --auto" dumb enough to avoid a pathological case where we'll have a ballooning objects dir because we figure we can run something "at the end", when "the end" could be hours away, and we're adding a new pack or hundreds of loose objects every second. So I don't think Duy's patch is a good way to go. The rationale in its commit message for including it can be better addressed by something like my WIP for just fixing the locking mechanism, since it fixes the stated problems of multiple "auto packing in the background" messages and the "we may waste some resources" (we take the lock earlier before doing 'real' work) without introducing its own pathological case of deferring "gc --auto" too much as we have unchecked object growth. I think that's really important. It's OK if "gc --auto" isn't optimal, but we should really avoid such pathological cases. It's also important that it's easy to explain, i.e. if this patch goes in I think it should update the second paragraph of the git-gc.txt docs. I.e. now it's not just: When common porcelain operations that create objects are run, they will check whether the repository has grown substantially since the last maintenance[...] But also something like: Except in cases where we're running porcelain commads that themselves might re-run aggregates of themselves, in which case we defer "gc" until the end. This currently only applies to "fetch", but other commands such as "rebase" etc. might learn to do this in the future. Note that if the sub-commands are numerous enough this might itself become pathological as "gc --auto" is deferred too much, so use option/config XYZ to ... Or whatever...