Re: [PATCH 3/8] fs: keep an index of current mount namespaces

Christian Brauner <brauner@xxxxxxxxxx> · Tue, 25 Jun 2024 15:39:55 +0200

On Tue, Jun 25, 2024 at 09:03:03AM GMT, Jeff Layton wrote:
> On Mon, 2024-06-24 at 11:49 -0400, Josef Bacik wrote:
> > In order to allow for listmount() to be used on different namespaces we
> > need a way to lookup a mount ns by its id.  Keep a rbtree of the current
> > !anonymous mount name spaces indexed by ID that we can use to look up
> > the namespace.
> > 
> > Co-developed-by: Christian Brauner <brauner@xxxxxxxxxx>
> > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
> > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>
> > ---
> >  fs/mount.h     |   2 +
> >  fs/namespace.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 113 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/mount.h b/fs/mount.h
> > index 4adce73211ae..ad4b1ddebb54 100644
> > --- a/fs/mount.h
> > +++ b/fs/mount.h
> > @@ -16,6 +16,8 @@ struct mnt_namespace {
> >  	u64 event;
> >  	unsigned int		nr_mounts; /* # of mounts in the namespace */
> >  	unsigned int		pending_mounts;
> > +	struct rb_node		mnt_ns_tree_node; /* node in the mnt_ns_tree */
> > +	refcount_t		passive; /* number references not pinning @mounts */
> >  } __randomize_layout;
> >  
> >  struct mnt_pcp {
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 45df82f2a059..babdebdb0a9c 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -78,6 +78,8 @@ static struct kmem_cache *mnt_cache __ro_after_init;
> >  static DECLARE_RWSEM(namespace_sem);
> >  static HLIST_HEAD(unmounted);	/* protected by namespace_sem */
> >  static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
> > +static DEFINE_RWLOCK(mnt_ns_tree_lock);
> > +static struct rb_root mnt_ns_tree = RB_ROOT; /* protected by namespace_sem */
> >  
> >  struct mount_kattr {
> >  	unsigned int attr_set;
> > @@ -103,6 +105,109 @@ EXPORT_SYMBOL_GPL(fs_kobj);
> >   */
> >  __cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock);
> >  
> > +static int mnt_ns_cmp(u64 seq, const struct mnt_namespace *ns)
> > +{
> > +	u64 seq_b = ns->seq;
> > +
> > +	if (seq < seq_b)
> > +		return -1;
> > +	if (seq > seq_b)
> > +		return 1;
> > +	return 0;
> > +}
> > +
> > +static inline struct mnt_namespace *node_to_mnt_ns(const struct rb_node *node)
> > +{
> > +	if (!node)
> > +		return NULL;
> > +	return rb_entry(node, struct mnt_namespace, mnt_ns_tree_node);
> > +}
> > +
> > +static bool mnt_ns_less(struct rb_node *a, const struct rb_node *b)
> > +{
> > +	struct mnt_namespace *ns_a = node_to_mnt_ns(a);
> > +	struct mnt_namespace *ns_b = node_to_mnt_ns(b);
> > +	u64 seq_a = ns_a->seq;
> > +
> > +	return mnt_ns_cmp(seq_a, ns_b) < 0;
> > +}
> > +
> > +static void mnt_ns_tree_add(struct mnt_namespace *ns)
> > +{
> > +	guard(write_lock)(&mnt_ns_tree_lock);
> > +	rb_add(&ns->mnt_ns_tree_node, &mnt_ns_tree, mnt_ns_less);
> > +}
> > +
> > +static void mnt_ns_release(struct mnt_namespace *ns)
> > +{
> > +	lockdep_assert_not_held(&mnt_ns_tree_lock);
> > +
> 
> Why is it bad to hold this lock here? AFAICT, put_user_ns just does a
> schedule_work when the counter goes to 0. Granted, I don't see a reason
> why you would want to hold it here, but usually that sort of assertion
> means that it _must_ be forbidden.

I just annotate locking assumptions liberally. There's no reason to take
the lock there and there's no current codepath that needs to hold it so
don't waste cycles and hold it when we don't have to.