On Wed, Jun 28, 2023 at 10:57:02AM -0600, Jens Axboe wrote: > On 6/28/23 8:58?AM, Jens Axboe wrote: > > On 6/27/23 10:01?PM, Kent Overstreet wrote: > >> On Tue, Jun 27, 2023 at 09:16:31PM -0600, Jens Axboe wrote: > >>> On 6/27/23 2:15?PM, Kent Overstreet wrote: > >>>>> to ktest/tests/xfstests/ and run it with -bcachefs, otherwise it kept > >>>>> failing because it assumed it was XFS. > >>>>> > >>>>> I suspected this was just a timing issue, and it looks like that's > >>>>> exactly what it is. Looking at the test case, it'll randomly kill -9 > >>>>> fsstress, and if that happens while we have io_uring IO pending, then we > >>>>> process completions inline (for a PF_EXITING current). This means they > >>>>> get pushed to fallback work, which runs out of line. If we hit that case > >>>>> AND the timing is such that it hasn't been processed yet, we'll still be > >>>>> holding a file reference under the mount point and umount will -EBUSY > >>>>> fail. > >>>>> > >>>>> As far as I can tell, this can happen with aio as well, it's just harder > >>>>> to hit. If the fput happens while the task is exiting, then fput will > >>>>> end up being delayed through a workqueue as well. The test case assumes > >>>>> that once it's reaped the exit of the killed task that all files are > >>>>> released, which isn't necessarily true if they are done out-of-line. > >>>> > >>>> Yeah, I traced it through to the delayed fput code as well. > >>>> > >>>> I'm not sure delayed fput is responsible here; what I learned when I was > >>>> tracking this down has mostly fell out of my brain, so take anything I > >>>> say with a large grain of salt. But I believe I tested with delayed_fput > >>>> completely disabled, and found another thing in io_uring with the same > >>>> effect as delayed_fput that wasn't being flushed. > >>> > >>> I'm not saying it's delayed_fput(), I'm saying it's the delayed putting > >>> io_uring can end up doing. But yes, delayed_fput() is another candidate. > >> > >> Sorry - was just working through my recollections/initial thought > >> process out loud > > > > No worries, it might actually be a combination and this is why my > > io_uring side patch didn't fully resolve it. Wrote a simple reproducer > > and it seems to reliably trigger it, but is fixed with an flush of the > > delayed fput list on mount -EBUSY return. Still digging... > > I discussed this with Christian offline. I have a patch that is pretty > simple, but it does mean that you'd wait for delayed fput flush off > umount. Which seems kind of iffy. > > I think we need to back up a bit and consider if the kill && umount > really is sane. If you kill a task that has open files, then any fput > from that task will end up being delayed. This means that the umount may > very well fail. That's why we have MNT_DETACH: umount2("/mnt", MNT_DETACH) will succeed even if fds are still open. > > It'd be handy if we could have umount wait for that to finish, but I'm > not at all confident this is a sane solution for all cases. And as > discussed, we have no way to even identify which files we'd need to > flush out of the delayed list. > > Maybe the test case just needs fixing? Christian suggested lazy/detach > umount and wait for sb release. There's an fsnotify hook for that, > fsnotify_sb_delete(). Obviously this is a bit more involved, but seems > to me that this would be the way to make it more reliable when killing > of tasks with open files are involved. You can wait on superblock destruction today in multiple ways. Roughly from the shell this should work: root@wittgenstein:~# cat sb_wait.sh #! /bin/bash echo "WARNING WARNING: I SUCK AT SHELL SCRIPTS" echo "mount fs" sudo mount -t tmpfs tmpfs /mnt touch /mnt/bla echo "pin sb by random file for 5s" sleep 5 > /mnt/bla & echo "establish inotify watch for sb destruction" inotifywait -e unmount /mnt & pid=$! echo "regular umount - will fail..." umount /mnt findmnt | grep "/mnt" echo "lazily umount - will succeed" umount -l /mnt findmnt | grep "/mnt" echo "and now we wait" wait $! echo "done" Can also use a tiny bpf lsm, fanotify in the future as we plans for that.