On Thu, Jun 20, 2019 at 5:49 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > > On Wed, Jun 19 2019, Jeff King wrote: > > > On Wed, Jun 19, 2019 at 08:01:55PM +0200, Ævar Arnfjörð Bjarmason wrote: > > > >> > You could sort of avoid the problem here too with > >> > > >> > parallel 'git fetch --no-auto-gc {}' ::: $(git remote) > >> > git gc --auto > >> > > >> > It's definitely simpler, but of course we have to manually add > >> > --no-auto-gc in everywhere we need, so not quite as elegant. > >> > > >> > Actually you could already do that with 'git -c gc.auto=false fetch', I guess. > >> > >> The point of the 'parallel' example is to show disconnected git > >> commands, think trying to run 'git' in a terminal while your editor > >> asynchronously runs a polling 'fetch', or a server with multiple > >> concurrent clients running 'gc --auto'. > >> > >> That's the question my RFC patch raises. As far as I can tell the > >> approach in your patch is only needed because our locking for gc is > >> buggy, rather than introduce the caveat that an fetch(N) operation won't > >> do "gc" until it's finished (we may have hundreds, thousands of remotes, > >> I use that for some more obscure use-cases) shouldn't we just fix the > >> locking? > > > > I think there may be room for both approaches. Yours fixes the repeated > > message in the more general case, but Duy's suggestion is the most > > efficient thing. > > > > I agree that the "thousands of remotes" case means we might want to gc > > in the interim. But we probably ought to do that deterministically > > rather than hoping that the pattern of lock contention makes sense. > > We do it deterministically, when gc.auto thresholds et al are exceeded > we kick one off without waiting for other stuff, if we can get the lock. > > I don't think this desire to just wait a bit until all the fetches are > complete makes sense as a special-case. > > If, as you noted in <20190619190845.GD28145@xxxxxxxxxxxxxxxxxxxxx>, the > desire is to reduce GC CPU use then you're better off just tweaking the > limits upwards. Then you get that with everything, like when you run > "commit" in a for-loop, not just this one special case of "fetch". > > We have existing potentially long-running operations like "fetch", > "rebase" and "git svn fetch" that run "gc --auto" for their incremental > steps, and that's a feature. gc --auto is added at arbitrary points to help garbage collection. I don't think it's ever intended to "do gc at this and that exact moment", just "hey this command has taken a lot of time already (i.e. no instant response needed) and it may have added a bit more garbage, let's just check real quick". > It keeps "gc --auto" dumb enough to avoid a pathological case where > we'll have a ballooning objects dir because we figure we can run > something "at the end", when "the end" could be hours away, and we're > adding a new pack or hundreds of loose objects every second. Are we optimizing for a rare (large scale) case? Such setup requires tuning regardless to me. > So I don't think Duy's patch is a good way to go. This reminds me of being perfect is the enemy of the good. A normal user has a couple remotes at most, finishing fast (enough) and in such case it's a good idea to wait until everything is in before running gc. Of course making git-gc more robust wrt. parallel access is great, but it's hard work. Dealing with locks is always tricky, especially when new locks can come up any time. Having said that, I don't mind if my patch gets dropped. It was just a "hey that multiple gc output looks strange, hah the fix is quite simple" moment for me. -- Duy