On Thu, 10 Aug 2023 at 22:29, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote: > > On Thu, Aug 10, 2023 at 10:20:22PM -0700, Linus Torvalds wrote: > > If it's purely "umount doesnt' succeed because the filesystem is still > > busy with cleanups", then things are much better. > > That's exactly it. We have various tests that kill -9 fio and then > umount, and umount spuriously fails. Well, it sounds like Jens already has some handle on at least one io_uring shutdown case that didn't wait for completion. At the same time, a random -EBUSY is kind of an expected failure in real life, since outside of strictly controlled environments you could easily have just some entirely unrelated thing that just happens to have looked at the filesystem when you tried to unmount it. So any real-life use tends to use umount in a (limited) loop. It might just make sense for the fsstress test scripts to do the same regardless. There's no actual good reason to think that -EBUSY is a hard error. It very much can be transient. In fact, I have this horrible flash-back memory to some auto-expiry scripts that used to do the equivalent of "umount -a -t autofs" every minute or so as a horrible model for expiring things, happy and secure in the knowledge that if the filesystem was still in active use, it would just fail. So may I suggest that even if the immediate issue ends up being sorted out, just from a robustness standpoint the "consider EBUSY a hard error" seems to be a mistake. Transient failures are pretty much expected, and not all of them are necessarily kernel-related (ie think "concurrent updatedb run" or any number of other possibilities). Linus