On Thu, Oct 03, 2013 at 12:06:04PM -0700, Linus Torvalds wrote: > On Thu, Oct 3, 2013 at 10:44 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > Anyway, I've done nicer variants of that protection for everything except > > fuse (hadn't gotten around to it yet); see vfs.git#experimental now: > > Ok, I did a quick test, and it looks ok here, so looking good for 3.13. > > However, the new smp_mb() in mntput_no_expire() is quite noticeable in > the path lookup stress-test profiles. And I'm not seeing what that > allegedly protects against, especially if mnt_ns is NULL (ie all the > common important cases). In the common case it's ->mnt_ns is *not* NULL; that's what we get if the damn thing is still mounted. What we need to avoid is this: mnt_ns non-NULL, mnt_count is 2 CPU1: umount -l CPU2: mntput umount_tree() clears mnt_ns drop mount_lock.lock namespace_unlock() calls mntput() decrement mnt_count see that mnt_ns is NULL grab mount_lock.lock check mnt_count decrement mnt_count see old value of mnt_ns decide to bugger off see it equal to 1 (i.e. miss decrement on CPU2) decide to bugger off The barrier in mntput() is to prevent that combination, so that either CPU2 would see mnt_ns cleared by CPU1, or CPU1 would see mnt_count decrement done by CPU2. Its counterpart on CPU1 is provided by spin_unlock/spin_lock we've done between clearing mnt_ns and checking mnt_count. Note that synchronize_rcu() in namespace_unlock() and rcu_read_lock() in mntput() are irrelevant here - the latter on CPU2 might very well have happened after the former on CPU1, so umount -l did *not* wait for CPU2 to do anything. Any suggestions re getting rid of that barrier? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html