Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > On Fri, Dec 30, 2016 at 08:10:01PM -0800, Krister Johansen wrote: >> This can cause a panic when simultaneous callers of put_mountpoint >> attempt to free the same mountpoint. This occurs because some callers >> hold the mount_hash_lock, while others hold the namespace lock. Some >> even hold both. >> >> In this submitter's case, the panic manifested itself as a GP fault in >> put_mountpoint() when it called hlist_del() and attempted to dereference >> a m_hash.pprev that had been poisioned by another thread. >> >> Instead of trying to force all mountpoint hash users back under the >> namespace lock, add locks that cover just the mountpoint hash. This >> uses hlist_bl to protect against simlultaneous additions and removals, >> and RCU for lookups. > > Too complicated, IMO. Look at the call graph for that sucker: > put_mountpoint > <- unhash_mnt > <- detach_mnt > <- attach_recursive_mnt [under mount_lock] > <- pivot_root [under mount_lock] > <- umount_mnt > <- mntput_no_expire [under mount_lock] > <- umount_tree > <- do_umount [under mount_lock] > <- __detach_mounts [under mount_lock] > <- copy_tree [under mount_lock] > <- drop_collected_mounts [under mount_lock] > <- attach_recursive_mnt [under mount_lock] > <- do_loopback [under mount_lock] > <- mark_mounts_for_expiry [under mount_lock] > <- shrink_submounts > <- do_umount [under mount_lock] > <- __detach_mounts [under mount_lock] > <- __detach_mounts [right after dropping mount_lock] > <- unlock_mount > <- pivot_root > Now, __detach_mounts() thing is trivially fixed - we just move that call > one line up. unhash_mnt() is all covered. Which leaves us with unlock_mount() > and pivot_root() and both are _not_ hot paths - not by any stretch of > imagination. We also have lookup_mountpoint(), which is not hot either. > Note that hash insertions and lookups are serialized on namespace lock; > it's only removal from final mntput() that can be triggered outside. So > playing with bitlocks is absolutely pointless. > > Let's do this: make sure that all callers of lookup_mountpoint() have > lock_mount (at least read_seqlock_excl), pull the call of put_mountpoint() > in __detach_mounts() under mount_lock and slap read_seqlock_excl() around > the calls in unlock_mount() and pivot_root(). Note that read_seqlock_excl() > *is* exclusive - the "read" part in it is about "don't bump the seqcount > part of mount_lock". I.e. the patch below ought to fix that and it's much > simpler than your variant. Do you see any holes in the above? The only significant thing I see is that you have not taken the mount_lock on the path where new_mountpoint adds the new struct mountpoint into the mountpoint hash table. Just for managing this fix we need a: "Cc: stable@xxxxxxxxxxxxxxx" and a "Fixes: ce07d891a089 ("mnt: Honor MNT_LOCKED when detaching mounts")" As the bug is 2 years old at this point. I will start with your patch and see about adding handling the missing locking in new_mountpoint and see where it gets me. Eric > diff --git a/fs/namespace.c b/fs/namespace.c > index b5b1259e064f..ca98a8ff2732 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -1595,11 +1595,11 @@ void __detach_mounts(struct dentry *dentry) > struct mount *mnt; > > namespace_lock(); > + lock_mount_hash(); > mp = lookup_mountpoint(dentry); > if (IS_ERR_OR_NULL(mp)) > goto out_unlock; > > - lock_mount_hash(); > event++; > while (!hlist_empty(&mp->m_list)) { > mnt = hlist_entry(mp->m_list.first, struct mount, mnt_mp_list); > @@ -1609,9 +1609,9 @@ void __detach_mounts(struct dentry *dentry) > } > else umount_tree(mnt, UMOUNT_CONNECTED); > } > - unlock_mount_hash(); > put_mountpoint(mp); > out_unlock: > + unlock_mount_hash(); > namespace_unlock(); > } > > @@ -2038,7 +2038,10 @@ static struct mountpoint *lock_mount(struct path *path) > namespace_lock(); > mnt = lookup_mnt(path); > if (likely(!mnt)) { > - struct mountpoint *mp = lookup_mountpoint(dentry); > + struct mountpoint *mp; > + read_seqlock_excl(&mount_lock); > + mp = lookup_mountpoint(dentry); > + read_sequnlock_excl(&mount_lock); > if (!mp) > mp = new_mountpoint(dentry); > if (IS_ERR(mp)) { > @@ -2059,7 +2062,9 @@ static struct mountpoint *lock_mount(struct path *path) > static void unlock_mount(struct mountpoint *where) > { > struct dentry *dentry = where->m_dentry; > + read_seqlock_excl(&mount_lock); > put_mountpoint(where); > + read_sequnlock_excl(&mount_lock); > namespace_unlock(); > inode_unlock(dentry->d_inode); > } > @@ -3137,7 +3142,9 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root, > list_del_init(&new_mnt->mnt_expire); > unlock_mount_hash(); > chroot_fs_refs(&root, &new); > + read_seqlock_excl(&mount_lock); > put_mountpoint(root_mp); > + read_sequnlock_excl(&mount_lock); > error = 0; > out4: > unlock_mount(old_mp); -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html