On Fri, Oct 27, 2017 at 11:45:08AM +1100, NeilBrown wrote: > On Thu, Oct 26 2017, Paul E. McKenney wrote: > > > On Thu, Oct 26, 2017 at 01:26:37PM +1100, NeilBrown wrote: > >> > >> The synchronize_rcu() in namespace_unlock() is called every time > >> a filesystem is unmounted. If a great many filesystems are mounted, > >> this can cause a noticable slow-down in, for example, system shutdown. > >> > >> The sequence: > >> mkdir -p /tmp/Mtest/{0..5000} > >> time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done > >> time umount /tmp/Mtest/* > >> > >> on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and > >> 100 seconds to unmount them. > >> > >> Boot the same VM with 1 CPU and it takes 18 seconds to mount the > >> tmpfs filesystems, but only 36 to unmount. > >> > >> If we change the synchronize_rcu() to synchronize_rcu_expedited() > >> the umount time on a 4-cpu VM is 8 seconds to mount and 0.6 to > >> unmount. > >> > >> I think this 200-fold speed up is worth the slightly higher system > >> impact of use synchronize_rcu_expedited(). > >> > >> Signed-off-by: NeilBrown <neilb@xxxxxxxx> > >> --- > >> > >> Cc: to Paul and Josh in case they'll correct me if using _expedited() > >> is really bad here. > > > > I suspect that filesystem unmount is pretty rare in production real-time > > workloads, which are the ones that might care. So I would guess that > > this is OK. > > > > If the real-time guys ever do want to do filesystem unmounts while their > > real-time applications are running, they might modify this so that it can > > use synchronize_rcu() instead for real-time builds of the kernel. > > Thanks for the confirmation Paul. > > > > > But just for completeness, one way to make this work across the board > > might be to instead use call_rcu(), with the callback function kicking > > off a workqueue handler to do the rest of the unmount. Of course, > > in saying that, I am ignoring any mutexes that you might be holding > > across this whole thing, and also ignoring any problems that might arise > > when returning to userspace with some portion of the unmount operation > > still pending. (For example, someone unmounting a filesystem and then > > immediately remounting that same filesystem.) > > I had briefly considered that option, but it doesn't work. > The purpose of this synchronize_rcu() is to wait for any filename lookup > which might be locklessly touching the mountpoint to complete. > It is only after that that the real meat of unmount happen - the > filesystem is told that the last reference is gone, and it gets to > flush any saved changes out to disk etc. > That stuff really has to happen before the umount syscall returns. Hey, I was hoping! ;-) Thanx, Paul