On Thu, Oct 3, 2013 at 12:43 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > In the common case it's ->mnt_ns is *not* NULL; that's what we get if > the damn thing is still mounted. Yeah, I misread the profile assembly code. The point being that the nice fast case now has the smp_mb() in it, and it accounts for about 60% of the cost of that function on my performance profile. > What we need to avoid is this: > > mnt_ns non-NULL, mnt_count is 2 > CPU1: umount -l CPU2: mntput > umount_tree() clears mnt_ns > drop mount_lock.lock > namespace_unlock() calls mntput() > decrement mnt_count > see that mnt_ns is NULL > grab mount_lock.lock > check mnt_count > decrement mnt_count > see old value of mnt_ns > decide to bugger off > see it equal to 1 (i.e. miss decrement on CPU2) > decide to bugger off > > The barrier in mntput() is to prevent that combination, so that either CPU2 > would see mnt_ns cleared by CPU1, or CPU1 would see mnt_count decrement done > by CPU2. Its counterpart on CPU1 is provided by spin_unlock/spin_lock we've > done between clearing mnt_ns and checking mnt_count. Note that > synchronize_rcu() in namespace_unlock() and rcu_read_lock() in mntput() are > irrelevant here - the latter on CPU2 might very well have happened after the > former on CPU1, so umount -l did *not* wait for CPU2 to do anything. > > Any suggestions re getting rid of that barrier? Hmm. The CPU2 mntput can only happen under RCU readlock, right? After the RCU grace period _and_ if the umount is going ahead, nothing should have a mnt pointer, right? So I'm wondering if you couldn't just have a synchronize_rcu() in that umount path, after clearing mnt_ns. At that point you _know_ you're the only one that should have access to the mnt. You'd need to drop the mount-hash lock for that. But I think you can do it in umount_tree(), right? IOW, you could make the rule be that umount_tree() must be called with the namespace lock and the mount-hash lock, and it will drop both. Or does that get too painful too? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html