On Thu, Jan 19, 2023 at 11:09 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Jan 19, 2023 at 04:14:55PM -0500, Eric Chanudet wrote: > > From: Alexander Larsson <alexl@xxxxxxxxxx> > > > > Use call_rcu to defer releasing the umount'ed or detached filesystem > > when calling namepsace_unlock(). > > > > Calling synchronize_rcu_expedited() has a significant cost on RT kernel > > that default to rcupdate.rcu_normal_after_boot=1. > > > > For example, on a 6.2-rt1 kernel: > > perf stat -r 10 --null --pre 'mount -t tmpfs tmpfs mnt' -- umount mnt > > 0.07464 +- 0.00396 seconds time elapsed ( +- 5.31% ) > > > > With this change applied: > > perf stat -r 10 --null --pre 'mount -t tmpfs tmpfs mnt' -- umount mnt > > 0.00162604 +- 0.00000637 seconds time elapsed ( +- 0.39% ) > > > > Waiting for the grace period before completing the syscall does not seem > > mandatory. The struct mount umount'ed are queued up for release in a > > separate list and no longer accessible to following syscalls. > > Again, NAK. If a filesystem is expected to be shut down by umount(2), > userland expects it to have been already shut down by the time the > syscall returns. > > It's not just visibility in namespace; it's "can I pull the disk out?". > Or "can the shutdown get to taking the network down?", for that matter. In the usecase we're worrying about, all the unmounts are lazy (i.e. MNT_DETACH). What about delaying the destroy in that case? That seems in line with the expected behaviour of lazy shutdown. I.e. you can't rely on it to be settled anyway. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander Larsson Red Hat, Inc alexl@xxxxxxxxxx alexander.larsson@xxxxxxxxx