On Mon, Apr 03, 2023 at 04:45:16PM +0200, Alexander Mikhalitsyn wrote: > Useful if for some reason bindmounts root dentries get invalidated > but it's needed to revalidate existing bindmounts without remounting. > > Cc: Miklos Szeredi <mszeredi@xxxxxxxxxx> > Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> > Cc: Amir Goldstein <amir73il@xxxxxxxxx> > Cc: Stéphane Graber <stgraber@xxxxxxxxxx> > Cc: Seth Forshee <sforshee@xxxxxxxxxx> > Cc: Christian Brauner <brauner@xxxxxxxxxx> > Cc: Andrei Vagin <avagin@xxxxxxxxx> > Cc: Pavel Tikhomirov <ptikhomirov@xxxxxxxxxxxxx> > Cc: Bernd Schubert <bschubert@xxxxxxx> > Cc: linux-fsdevel@xxxxxxxxxxxxxxx > Cc: linux-kernel@xxxxxxxxxxxxxxx > Cc: criu@xxxxxxxxxx > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@xxxxxxxxxxxxx> > --- > fs/namespace.c | 90 +++++++++++++++++++++++++++++++++++ > include/linux/mnt_namespace.h | 3 ++ > 2 files changed, 93 insertions(+) > > diff --git a/fs/namespace.c b/fs/namespace.c > index bc0f15257b49..b74d00d6abb0 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -568,6 +568,96 @@ static int mnt_make_readonly(struct mount *mnt) > return ret; > } > > +struct bind_mount_list_item { > + struct list_head list; > + struct vfsmount *mnt; > +}; > + > +/* > + * sb_revalidate_bindmounts - Relookup/reset bindmounts root dentries > + * > + * Useful if for some reason bindmount root dentries get invalidated > + * but it's needed to revalidate existing bindmounts without remounting. > + */ > +int sb_revalidate_bindmounts(struct super_block *sb) It's difficult to not strongly dislike this patchset on the basis of the need for a function like this alone. And I just have to say it even if I normally try not to do this: This whole function is bonkers in my opinion. But leaving that aside for a second. This really needs detailed explanations on locking, assumptions, and an explanation what you're doing here. This looks crazy to me and definitely isn't fit to be a generic helper in this form. > +{ > + struct mount *mnt; > + struct bind_mount_list_item *bmnt, *next; > + int err = 0; > + struct vfsmount *root_mnt = NULL; > + LIST_HEAD(mnt_to_update); > + char *buf; > + > + buf = (char *) __get_free_page(GFP_KERNEL); > + if (!buf) > + return -ENOMEM; > + > + lock_mount_hash(); > + list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) { > + /* we only want to touch bindmounts */ > + if (mnt->mnt.mnt_root == sb->s_root) { > + if (!root_mnt) > + root_mnt = mntget(&mnt->mnt); > + > + continue; > + } > + > + bmnt = kzalloc(sizeof(struct bind_mount_list_item), GFP_NOWAIT | __GFP_NOWARN); Allocating under lock_mount_hash() even if doable with this flag combination should be avoided at all costs imho. That just seems hacky. > + if (!bmnt) { > + err = -ENOMEM; > + goto exit; You're exiting the function with lock_mount_hash() held... > + } > + > + bmnt->mnt = mntget(&mnt->mnt); > + list_add_tail(&bmnt->list, &mnt_to_update); > + } > + unlock_mount_hash(); You've dropped unlock_mount_hash() and the function doesn't hold namespace_lock() and isn't documented as requiring the caller to hold it. And the later patch that uses this doesn't afaict (although I haven't looked at any of the fuse specific stuff). So this is open to all sorts of races with mount and unmount now. This needs an explanation why that doesn't matter. > + > + /* TODO: get rid of this limitation */ Confused about this comment. > + if (!root_mnt) { > + err = -ENOENT; > + goto exit; > + } > + > + list_for_each_entry_safe(bmnt, next, &mnt_to_update, list) { No one else can access that list so list_for_each_entry_safe() seems pointless. > + struct vfsmount *cur_mnt = bmnt->mnt; > + struct path path; > + struct dentry *old_root; > + char *p; > + > + p = dentry_path(cur_mnt->mnt_root, buf, PAGE_SIZE); > + if (IS_ERR(p)) > + goto exit; > + > + /* TODO: are these lookup flags fully safe and correct? */ > + err = vfs_path_lookup(root_mnt->mnt_root, root_mnt, > + p, LOOKUP_FOLLOW|LOOKUP_AUTOMOUNT|LOOKUP_REVAL, &path); > + if (err) > + goto exit; > + > + /* replace bindmount root dentry */ > + lock_mount_hash(); > + old_root = cur_mnt->mnt_root; > + cur_mnt->mnt_root = dget(path.dentry); mnt->mnt_root isn't protected by lock_mount_hash(). It's invariant after it has been set and a lot of code just assumes that it's stable. Top of my hat, since you're not holding namespace_lock() mount propagation can be going on concurrently so propagate_one() can check if (!is_subdir(mp->m_dentry, m->mnt.mnt_root)) while you're happily updating it. A lot of code could actually be touching mnt->mnt_root while you're updating it. There's probably a lot more issues with this I'm just not seeing without spending more time on this. But NAK on this as it stands. Sorry. > + dput(old_root); > + unlock_mount_hash(); > + > + path_put(&path); > + } > + > +exit: > + free_page((unsigned long) buf); > + mntput(root_mnt); > + list_for_each_entry_safe(bmnt, next, &mnt_to_update, list) { > + list_del(&bmnt->list); > + mntput(bmnt->mnt); > + kfree(bmnt); > + } > + > + return err; > +} > +EXPORT_SYMBOL_GPL(sb_revalidate_bindmounts);