Hello, On Thu, Jan 16, 2025 at 02:27:45PM +0100, Sebastian Andrzej Siewior wrote: > > Shouldn't this be freed somewhere? > > There is > char *kn_name __free(kfree) = NULL; > > at the top. This will kfree(kn_name) once it is out of scope (on return > from rdtgroup_pseudo_lock_create()). Ah, makes sense. ... > > > @@ -557,16 +568,18 @@ void kernfs_put(struct kernfs_node *kn) > > > if (!kn || !atomic_dec_and_test(&kn->count)) > > > return; > > > root = kernfs_root(kn); > > > + guard(rcu)(); > > > repeat: > > > /* > > > * Moving/renaming is always done while holding reference. > > > * kn->parent won't change beneath us. > > > */ > > > - parent = kn->parent; > > > + parent = rcu_dereference(kn->parent); > > > > I wonder whether it'd be better to encode the reference count rule (ie. add > > the condition kn->count == 0 to deref_check) in the kn->parent deref > > accessor. This function doesn't need RCU read lock and holding it makes it > > more confusing. > > You are saying that we don't need RCU here because if we drop the last > reference then nobody can rename the node anymore and so parent can't > change. That sounds right. > What about using rcu_dereference_protected() instead? Using > rcu_dereference(x, !atomic_read(&kn->count)) looks odd given that we > established that the counter is 0. Therefore I would suggest > rcu_access_pointer() but the reference drop might qualify as "locked". I think it's usually a better form to encode the whole access rule in a shared accessor for the field so that the deref rules for the field can be understood and enforced from the shared accessor and the shared accessor would use rcu_dereference_protected() internally. > > > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c > > > index 8502ef68459b9..05f7b30283150 100644 > > > --- a/fs/kernfs/file.c > > > +++ b/fs/kernfs/file.c > > > @@ -911,9 +911,11 @@ static void kernfs_notify_workfn(struct work_struct *work) > > > /* kick fsnotify */ > > > > > > down_read(&root->kernfs_supers_rwsem); > > > + down_read(&root->kernfs_rwsem); > > > > Why is this addition necessary? Hmm... was the code previously broken w.r.t. > > renaming? Can this be RCU? > > I *think* it was broken unless you unsure somehow that this can't be > invoked on nodes which can be renamed. > The ensures that the later obtained kn_name does not freed after a > rename. > This can not be RCU because ilookup() has wait_on_inode() (might sleep). If it was broken, let's separate it out to its own patch. > > > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c > > > index 1358c21837f1a..db71faba3bb53 100644 > > > --- a/fs/kernfs/mount.c > > > +++ b/fs/kernfs/mount.c > > > @@ -145,8 +145,10 @@ static struct dentry *kernfs_fh_to_parent(struct super_block *sb, > > > static struct dentry *kernfs_get_parent_dentry(struct dentry *child) > > > { > > > struct kernfs_node *kn = kernfs_dentry_node(child); > > > + struct kernfs_root *root = kernfs_root(kn); > > > > > > - return d_obtain_alias(kernfs_get_inode(child->d_sb, kn->parent)); > > > + guard(rwsem_read)(&root->kernfs_rwsem); > > > + return d_obtain_alias(kernfs_get_inode(child->d_sb, kernfs_rcu_get_parent(kn))); > > > > Ditto. > > kernfs_rcu_get_parent() gets you name from the kn. Can you ensure that > it won't go away during a rename? If so, I would add the matching > comment then. > There is d_obtain_alias() -> __d_obtain_alias() -> d_alloc_anon() which > makes not possible to use RCU. If true, better to put it in its own prep patch, I think. ... > > > @@ -216,6 +219,9 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn, > > > if (!kn->parent) > > > return dentry; > > > > > > + root = kernfs_root(kn); > > > + guard(rwsem_read)(&root->kernfs_rwsem); > > > > Here too, it's a bit confusing that it's adding new locking. Was the code > > broken before? If so, it'd be clearer if the fixes were in their own patch. > > It dereferences name (later in lookup_positive_unlocked()). I don't see > how it is safe against a rename without the lock. > > If you agree that all three are bugs, that existed before, then I will > extract it out of this patch. Not sure whether they're all correct but let's separate them out. ... > > I wonder whether it'd be better to rename kn->parent to something like > > kn->__parent (or maybe some other suffix) to clarify that the field is not > > to be deref'ed directly and kernfs_parent() helper is made available to the > > users. That way, users can benefit from the additional conditions in > > rcu_dereference_check(). > > sparse should yell at people if they deference directly. I have no > problem to rename it to __parent if you say so. Sparse isn't useless but also often ignored. Probably better to be explicit. ... > > e.g. Here, it'd be a lot better if kernfs provided helper can be used so > > that deref condition check can be preserved. > > Something like > | static struct cgroup *cg_get_parent_priv(struct kernfs_node *kn) > | { > | return rcu_dereference_check(kn->parent, kn->flags & KERNFS_ROOT_INVARIANT_PARENT)->priv; > | } Maybe a shorter name? Thanks. -- tejun