On Wed, Jun 28, 2023 at 07:50:18PM -0400, Kent Overstreet wrote: > On Wed, Jun 28, 2023 at 05:14:09PM -0600, Jens Axboe wrote: > > On 6/28/23 4:55?PM, Kent Overstreet wrote: > > >> But it's not aio (or io_uring or whatever), it's simply the fact that > > >> doing an fput() from an exiting task (for example) will end up being > > >> done async. And hence waiting for task exits is NOT enough to ensure > > >> that all file references have been released. > > >> > > >> Since there are a variety of other reasons why a mount may be pinned and > > >> fail to umount, perhaps it's worth considering that changing this > > >> behavior won't buy us that much. Especially since it's been around for > > >> more than 10 years: > > > > > > Because it seems that before io_uring the race was quite a bit harder to > > > hit - I only started seeing it when things started switching over to > > > io_uring. generic/388 used to pass reliably for me (pre backpointers), > > > now it doesn't. > > > > I literally just pasted a script that hits it in one second with aio. So > > maybe generic/388 doesn't hit it as easily, but it's surely TRIVIAL to > > hit with aio. As demonstrated. The io_uring is not hard to bring into > > parity on that front, here's one I posted earlier today for 6.5: > > > > https://lore.kernel.org/io-uring/20230628170953.952923-4-axboe@xxxxxxxxx/ > > > > Doesn't change the fact that you can easily hit this with io_uring or > > aio, and probably more things too (didn't look any further). Is it a > > realistic thing outside of funky tests? Probably not really, or at least > > if those guys hit it they'd probably have the work-around hack in place > > in their script already. > > > > But the fact is that it's been around for a decade. It's somehow a lot > > easier to hit with bcachefs than XFS, which may just be because the > > former has a bunch of workers and this may be deferring the delayed fput > > work more. Just hand waving. > > Not sure what you're arguing here...? > > We've had a long standing bug, it's recently become much easier to hit > (for multiple reasons); we seem to be in agreement on all that. All I'm > saying is that the existence of that bug previously is not reason to fix > it now. I agree with Kent here - the kernel bug needs to be fixed regardless of how long it has been around. Blaming the messenger (userspace, fstests, etc) and saying it should work around a spurious, unpredictable, undesirable and user-undebuggable kernel behaviour is not an acceptible solution here... I don't care how the kernel bug gets fixed, I just want the spurious unmount failures when there are no userspace processes actively using the filesytsem to go away forever. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx