3.16.61-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: Al Viro <viro@xxxxxxxxxxxxxxxxxx> commit 119e1ef80ecfe0d1deb6378d4ab41f5b71519de1 upstream. __legitimize_mnt() has two problems - one is that in case of success the check of mount_lock is not ordered wrt preceding increment of refcount, making it possible to have successful __legitimize_mnt() on one CPU just before the otherwise final mntpu() on another, with __legitimize_mnt() not seeing mntput() taking the lock and mntput() not seeing the increment done by __legitimize_mnt(). Solved by a pair of barriers. Another is that failure of __legitimize_mnt() on the second read_seqretry() leaves us with reference that'll need to be dropped by caller; however, if that races with final mntput() we can end up with caller dropping rcu_read_lock() and doing mntput() to release that reference - with the first mntput() having freed the damn thing just as rcu_read_lock() had been dropped. Solution: in "do mntput() yourself" failure case grab mount_lock, check if MNT_DOOMED has been set by racing final mntput() that has missed our increment and if it has - undo the increment and treat that as "failure, caller doesn't need to drop anything" case. It's not easy to hit - the final mntput() has to come right after the first read_seqretry() in __legitimize_mnt() *and* manage to miss the increment done by __legitimize_mnt() before the second read_seqretry() in there. The things that are almost impossible to hit on bare hardware are not impossible on SMP KVM, though... Reported-by: Oleg Nesterov <oleg@xxxxxxxxxx> Fixes: 48a066e72d97 ("RCU'd vsfmounts") Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> [bwh: Backported to 3.16: __legitimize_mnt() has not been split out from legitimize_mnt(). Adjust the added return statement and comments accordingly.] Signed-off-by: Ben Hutchings <ben@xxxxxxxxxxxxxxx> --- fs/namespace.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) --- a/fs/namespace.c +++ b/fs/namespace.c @@ -592,12 +592,20 @@ bool legitimize_mnt(struct vfsmount *bas return true; mnt = real_mount(bastard); mnt_add_count(mnt, 1); + smp_mb(); // see mntput_no_expire() if (likely(!read_seqretry(&mount_lock, seq))) return true; if (bastard->mnt_flags & MNT_SYNC_UMOUNT) { mnt_add_count(mnt, -1); return false; } + lock_mount_hash(); + if (unlikely(bastard->mnt_flags & MNT_DOOMED)) { + mnt_add_count(mnt, -1); + unlock_mount_hash(); + return false; + } + unlock_mount_hash(); rcu_read_unlock(); mntput(bastard); rcu_read_lock(); @@ -984,6 +992,11 @@ put_again: return; } lock_mount_hash(); + /* + * make sure that if legitimize_mnt() has not seen us grab + * mount_lock, we'll see their refcount increment here. + */ + smp_mb(); mnt_add_count(mnt, -1); if (mnt_get_count(mnt)) { rcu_read_unlock();