Re: [PATCH] Fix a race in put_mountpoint.

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Sat, 31 Dec 2016 06:17:29 +0000

On Fri, Dec 30, 2016 at 08:10:01PM -0800, Krister Johansen wrote:
> This can cause a panic when simultaneous callers of put_mountpoint
> attempt to free the same mountpoint.  This occurs because some callers
> hold the mount_hash_lock, while others hold the namespace lock.  Some
> even hold both.
> 
> In this submitter's case, the panic manifested itself as a GP fault in
> put_mountpoint() when it called hlist_del() and attempted to dereference
> a m_hash.pprev that had been poisioned by another thread.
> 
> Instead of trying to force all mountpoint hash users back under the
> namespace lock, add locks that cover just the mountpoint hash.  This
> uses hlist_bl to protect against simlultaneous additions and removals,
> and RCU for lookups.

Too complicated, IMO.  Look at the call graph for that sucker:
put_mountpoint
        <- unhash_mnt
                <- detach_mnt
                        <- attach_recursive_mnt [under mount_lock]
                        <- pivot_root [under mount_lock]
                <- umount_mnt
                        <- mntput_no_expire [under mount_lock]
                        <- umount_tree
                                <- do_umount [under mount_lock]
                                <- __detach_mounts [under mount_lock]
                                <- copy_tree [under mount_lock]
                                <- drop_collected_mounts [under mount_lock]
                                <- attach_recursive_mnt [under mount_lock]
                                <- do_loopback [under mount_lock]
                                <- mark_mounts_for_expiry [under mount_lock]
                                <- shrink_submounts
                                        <- do_umount [under mount_lock]
                        <- __detach_mounts [under mount_lock]
        <- __detach_mounts [right after dropping mount_lock]
        <- unlock_mount
        <- pivot_root
Now, __detach_mounts() thing is trivially fixed - we just move that call
one line up.  unhash_mnt() is all covered.  Which leaves us with unlock_mount()
and pivot_root() and both are _not_ hot paths - not by any stretch of
imagination.  We also have lookup_mountpoint(), which is not hot either.
Note that hash insertions and lookups are serialized on namespace lock;
it's only removal from final mntput() that can be triggered outside.  So
playing with bitlocks is absolutely pointless.

Let's do this: make sure that all callers of lookup_mountpoint() have
lock_mount (at least read_seqlock_excl), pull the call of put_mountpoint()
in __detach_mounts() under mount_lock and slap read_seqlock_excl() around
the calls in unlock_mount() and pivot_root().  Note that read_seqlock_excl()
*is* exclusive - the "read" part in it is about "don't bump the seqcount
part of mount_lock".  I.e. the patch below ought to fix that and it's much
simpler than your variant.  Do you see any holes in the above?

diff --git a/fs/namespace.c b/fs/namespace.c
index b5b1259e064f..ca98a8ff2732 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1595,11 +1595,11 @@ void __detach_mounts(struct dentry *dentry)
 	struct mount *mnt;
 
 	namespace_lock();
+	lock_mount_hash();
 	mp = lookup_mountpoint(dentry);
 	if (IS_ERR_OR_NULL(mp))
 		goto out_unlock;
 
-	lock_mount_hash();
 	event++;
 	while (!hlist_empty(&mp->m_list)) {
 		mnt = hlist_entry(mp->m_list.first, struct mount, mnt_mp_list);
@@ -1609,9 +1609,9 @@ void __detach_mounts(struct dentry *dentry)
 		}
 		else umount_tree(mnt, UMOUNT_CONNECTED);
 	}
-	unlock_mount_hash();
 	put_mountpoint(mp);
 out_unlock:
+	unlock_mount_hash();
 	namespace_unlock();
 }
 
@@ -2038,7 +2038,10 @@ static struct mountpoint *lock_mount(struct path *path)
 	namespace_lock();
 	mnt = lookup_mnt(path);
 	if (likely(!mnt)) {
-		struct mountpoint *mp = lookup_mountpoint(dentry);
+		struct mountpoint *mp;
+		read_seqlock_excl(&mount_lock);
+		mp = lookup_mountpoint(dentry);
+		read_sequnlock_excl(&mount_lock);
 		if (!mp)
 			mp = new_mountpoint(dentry);
 		if (IS_ERR(mp)) {
@@ -2059,7 +2062,9 @@ static struct mountpoint *lock_mount(struct path *path)
 static void unlock_mount(struct mountpoint *where)
 {
 	struct dentry *dentry = where->m_dentry;
+	read_seqlock_excl(&mount_lock);
 	put_mountpoint(where);
+	read_sequnlock_excl(&mount_lock);
 	namespace_unlock();
 	inode_unlock(dentry->d_inode);
 }
@@ -3137,7 +3142,9 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	list_del_init(&new_mnt->mnt_expire);
 	unlock_mount_hash();
 	chroot_fs_refs(&root, &new);
+	read_seqlock_excl(&mount_lock);
 	put_mountpoint(root_mp);
+	read_sequnlock_excl(&mount_lock);
 	error = 0;
 out4:
 	unlock_mount(old_mp);
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html