Re: [PATCH v3 7/7] run-command: fix detaching when running auto maintenance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 19, 2024 at 08:17:22AM +0200, Patrick Steinhardt wrote:

> >   2. Having racy background maintenance doesn't seem great for test
> >      robustness. At the very least, it might subject us to the "rm"
> >      problems mentioned elsewhere, where we fail to clean up. Annotating
> >      individual "git gc" or "git maintenance" calls with an extra
> >      descriptor isn't too bad, but in this case it's all happening under
> >      the hood via fetch. Is it a potential problem for every script,
> >      then? If so, should we disable background detaching for all test
> >      repos, and then let the few that want to test it turn it back on?
> 
> Might be a good idea to set `maintenance.autoDetach=false` globally,
> yes. The only downside is of course that it wouldn't cause us to detect
> failures like the above, where the concurrency itself causes failure.
> 
> Anyway, for now I'll:
> 
>   - Send a patch to fix the race in t7900.
> 
>   - Investigate the reftable concurrency issue.
> 
>   - _Not_ send a patch that sets `maintenance.autoDetach=false`.

That sounds like a good direction. I do suspect there are at least _two_
races in t7900:

  1. the detached maintenance that we run explicitly, which causes the
     "rm" cleanup to fail

  2. whatever earlier test is kicking off detached maintenance via "git
     fetch", which is causing the reftable concurrency issue.

Fixing (1) should be easy (and it looks like you've already sent a
series). Fixing the reftable code will stop us from segfaulting for (2),
but I wonder if that detached maintenance might cause similar "rm" style
problems elsewhere.

> The last one requires a bit more discussion first, and we have been
> running with `gc.autoDetach=true` implicitly in the past. Thinking a bit
> more about it, the reason why the above bug triggers now is that
> git-gc(1) itself runs git-pack-refs(1), but does that _synchronously_
> before detaching itself. Now we detach at a higher level in the
> hierarchy, which means that the previously-synchronous part now runs
> asynchronously, as well.

That makes sense. I guess we've perhaps been doing background gc for a
long time, then, just not in the refs? In practice most repos in the
test suite aren't big enough to trigger auto-gc anyway, so it may only
affect a handful of scripts.

Once the reftable issue is fixed, it's possible that the lingering
detached processes don't cause a problem in practice (because they're
not really writing much, and/or have finished by the time anybody else
gets to cleanup), and we can just live with them. But I'm worried that
sounds like wishful thinking. ;)

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux