On 6/28/23 5:50?PM, Kent Overstreet wrote: > On Wed, Jun 28, 2023 at 05:14:09PM -0600, Jens Axboe wrote: >> On 6/28/23 4:55?PM, Kent Overstreet wrote: >>>> But it's not aio (or io_uring or whatever), it's simply the fact that >>>> doing an fput() from an exiting task (for example) will end up being >>>> done async. And hence waiting for task exits is NOT enough to ensure >>>> that all file references have been released. >>>> >>>> Since there are a variety of other reasons why a mount may be pinned and >>>> fail to umount, perhaps it's worth considering that changing this >>>> behavior won't buy us that much. Especially since it's been around for >>>> more than 10 years: >>> >>> Because it seems that before io_uring the race was quite a bit harder to >>> hit - I only started seeing it when things started switching over to >>> io_uring. generic/388 used to pass reliably for me (pre backpointers), >>> now it doesn't. >> >> I literally just pasted a script that hits it in one second with aio. So >> maybe generic/388 doesn't hit it as easily, but it's surely TRIVIAL to >> hit with aio. As demonstrated. The io_uring is not hard to bring into >> parity on that front, here's one I posted earlier today for 6.5: >> >> https://lore.kernel.org/io-uring/20230628170953.952923-4-axboe@xxxxxxxxx/ >> >> Doesn't change the fact that you can easily hit this with io_uring or >> aio, and probably more things too (didn't look any further). Is it a >> realistic thing outside of funky tests? Probably not really, or at least >> if those guys hit it they'd probably have the work-around hack in place >> in their script already. >> >> But the fact is that it's been around for a decade. It's somehow a lot >> easier to hit with bcachefs than XFS, which may just be because the >> former has a bunch of workers and this may be deferring the delayed fput >> work more. Just hand waving. > > Not sure what you're arguing here...? > > We've had a long standing bug, it's recently become much easier to hit > (for multiple reasons); we seem to be in agreement on all that. All I'm > saying is that the existence of that bug previously is not reason to fix > it now. Not really arguing, just stating that it's not a huge problem as it's not something that real world would tend to do and probably why we saw it in a test case instead. >>>> then we'd probably want to move that deferred fput list to the >>>> task_struct and ensure that it gets run if the task exits rather than >>>> have a global deferred list. Currently we have: >>>> >>>> >>>> 1) If kthread or in interrupt >>>> 1a) add to global fput list >>>> 2) task_work_add if not. If that fails, goto 1a. >>>> >>>> which would then become: >>>> >>>> 1) If kthread or in interrupt >>>> 1a) add to global fput list >>>> 2) task_work_add if not. If that fails, we know task is existing. add to >>>> per-task defer list to be run at a convenient time before task has >>>> exited. >>> >>> no, it becomes: >>> if we're running in a user task, or if we're doing an operation on >>> behalf of a user task, add to the user task's deferred list: otherwise >>> add to global deferred list. >> >> And how would the "on behalf of a user task" work in terms of being >> in_interrupt()? > > I don't see any relation to in_interrupt? Just saying that you'd now need the task passed in. > We'd have to add a version of fput() that takes an additional > task_struct argument, and plumb that through the aio code - kioctx > lifetime is tied to mm_struct, not task_struct, so we'd have to add a > ref to the task_struct to kiocb. > > Which would probably be a good thing tbh, it'd let us e.g. account cpu > time back to the original task when kiocb completion has to run out of a > workqueue. Might also introduce some funky dependencies. Probably not an issue it tied to the aio_kiocb. If you go ahead with that, just make sure you keep the task referencing out of the fput variant for users that don't need that. -- Jens Axboe