On Fri, Dec 30, 2016 at 08:10:01PM -0800, Krister Johansen wrote: > This can cause a panic when simultaneous callers of put_mountpoint > attempt to free the same mountpoint. This occurs because some callers > hold the mount_hash_lock, while others hold the namespace lock. Some > even hold both. > > In this submitter's case, the panic manifested itself as a GP fault in > put_mountpoint() when it called hlist_del() and attempted to dereference > a m_hash.pprev that had been poisioned by another thread. > > Instead of trying to force all mountpoint hash users back under the > namespace lock, add locks that cover just the mountpoint hash. This > uses hlist_bl to protect against simlultaneous additions and removals, > and RCU for lookups. Too complicated, IMO. Look at the call graph for that sucker: put_mountpoint <- unhash_mnt <- detach_mnt <- attach_recursive_mnt [under mount_lock] <- pivot_root [under mount_lock] <- umount_mnt <- mntput_no_expire [under mount_lock] <- umount_tree <- do_umount [under mount_lock] <- __detach_mounts [under mount_lock] <- copy_tree [under mount_lock] <- drop_collected_mounts [under mount_lock] <- attach_recursive_mnt [under mount_lock] <- do_loopback [under mount_lock] <- mark_mounts_for_expiry [under mount_lock] <- shrink_submounts <- do_umount [under mount_lock] <- __detach_mounts [under mount_lock] <- __detach_mounts [right after dropping mount_lock] <- unlock_mount <- pivot_root Now, __detach_mounts() thing is trivially fixed - we just move that call one line up. unhash_mnt() is all covered. Which leaves us with unlock_mount() and pivot_root() and both are _not_ hot paths - not by any stretch of imagination. We also have lookup_mountpoint(), which is not hot either. Note that hash insertions and lookups are serialized on namespace lock; it's only removal from final mntput() that can be triggered outside. So playing with bitlocks is absolutely pointless. Let's do this: make sure that all callers of lookup_mountpoint() have lock_mount (at least read_seqlock_excl), pull the call of put_mountpoint() in __detach_mounts() under mount_lock and slap read_seqlock_excl() around the calls in unlock_mount() and pivot_root(). Note that read_seqlock_excl() *is* exclusive - the "read" part in it is about "don't bump the seqcount part of mount_lock". I.e. the patch below ought to fix that and it's much simpler than your variant. Do you see any holes in the above? diff --git a/fs/namespace.c b/fs/namespace.c index b5b1259e064f..ca98a8ff2732 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1595,11 +1595,11 @@ void __detach_mounts(struct dentry *dentry) struct mount *mnt; namespace_lock(); + lock_mount_hash(); mp = lookup_mountpoint(dentry); if (IS_ERR_OR_NULL(mp)) goto out_unlock; - lock_mount_hash(); event++; while (!hlist_empty(&mp->m_list)) { mnt = hlist_entry(mp->m_list.first, struct mount, mnt_mp_list); @@ -1609,9 +1609,9 @@ void __detach_mounts(struct dentry *dentry) } else umount_tree(mnt, UMOUNT_CONNECTED); } - unlock_mount_hash(); put_mountpoint(mp); out_unlock: + unlock_mount_hash(); namespace_unlock(); } @@ -2038,7 +2038,10 @@ static struct mountpoint *lock_mount(struct path *path) namespace_lock(); mnt = lookup_mnt(path); if (likely(!mnt)) { - struct mountpoint *mp = lookup_mountpoint(dentry); + struct mountpoint *mp; + read_seqlock_excl(&mount_lock); + mp = lookup_mountpoint(dentry); + read_sequnlock_excl(&mount_lock); if (!mp) mp = new_mountpoint(dentry); if (IS_ERR(mp)) { @@ -2059,7 +2062,9 @@ static struct mountpoint *lock_mount(struct path *path) static void unlock_mount(struct mountpoint *where) { struct dentry *dentry = where->m_dentry; + read_seqlock_excl(&mount_lock); put_mountpoint(where); + read_sequnlock_excl(&mount_lock); namespace_unlock(); inode_unlock(dentry->d_inode); } @@ -3137,7 +3142,9 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root, list_del_init(&new_mnt->mnt_expire); unlock_mount_hash(); chroot_fs_refs(&root, &new); + read_seqlock_excl(&mount_lock); put_mountpoint(root_mp); + read_sequnlock_excl(&mount_lock); error = 0; out4: unlock_mount(old_mp); -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html