On Sat, 17 Feb 2024 at 06:00, Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > But I have a really stupid (I know nothing about vfs) question, why do we > need pidfdfs_ino and pid->ino ? Can you explain why pidfdfs_alloc_file() > can't simply use, say, iget_locked(pidfdfs_sb, (unsigned long)pid) ? > > IIUC, if this pid is freed and then another "struct pid" has the same address > we can rely on __wait_on_freeing_inode() ? Heh. Maybe it would work, but we really don't want to expose core kernel pointers to user space as the inode number. So then we'd have to add extra hackery to that (ie we'd have to intercept stat calls, and we'd have to have something else for ->d_dname() etc..). Those are all things that the VFS does support, but ... So I do prefer Christian's new approach, although some of it ends up being a bit unclear. Christian, can you explain why this: spin_lock(&alias->d_lock); dget_dlock(alias); spin_unlock(&alias->d_lock); instead of just 'dget()'? Also, while I found the old __ns_get_path() to be fairly disgusting, I actually think it's simpler and clearer than playing around with the dentry alias list. So my expectation on code sharing was that you'd basically lift the old __ns_get_path(), make *that* the helper, and just pass it an argument that is the pointer to the filesystem "stashed" entry... And yes, using "atomic_long_t" for stashed is a crime against humanity. It's also entirely pointless. There are no actual atomic operations that the code wants except for reading and writing (aka READ_ONCE() and WRITE_ONCE()) and cmpxchg (aka just cmpxchg()). Using "atomic_long_t" buys the code nothing, and only makes things more complicated and requires crazy casts. So I think the nsfs.c code should be changed to do - atomic_long_t stashed; + struct dentry *stashed; and remove the crazy workarounds for using the wrong type. Something like the attached patch. Then, I think the whole "lockref_get_not_dead()" etc games of __ns_get_path() could be extracted out into a helper function that takes that "&ns->stashed" pointer as an argument, and now that helper could also be used for pidfs, except pidfs obviously just does "&pid->stashed" instead. Hmm? Entirely untested patch. Also, the above idea may be broken because of some obvious issue that I didn't think about. Christian? Linus
fs/nsfs.c | 11 ++++------- include/linux/ns_common.h | 2 +- include/linux/proc_ns.h | 2 +- 3 files changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/nsfs.c b/fs/nsfs.c index 34e1e3e36733..1104db67c9a4 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -38,7 +38,7 @@ static void ns_prune_dentry(struct dentry *dentry) struct inode *inode = d_inode(dentry); if (inode) { struct ns_common *ns = inode->i_private; - atomic_long_set(&ns->stashed, 0); + WRITE_ONCE(ns->stashed, NULL); } } @@ -61,13 +61,11 @@ static int __ns_get_path(struct path *path, struct ns_common *ns) struct vfsmount *mnt = nsfs_mnt; struct dentry *dentry; struct inode *inode; - unsigned long d; rcu_read_lock(); - d = atomic_long_read(&ns->stashed); - if (!d) + dentry = READ_ONCE(ns->stashed); + if (!dentry) goto slow; - dentry = (struct dentry *)d; if (!lockref_get_not_dead(&dentry->d_lockref)) goto slow; rcu_read_unlock(); @@ -94,8 +92,7 @@ static int __ns_get_path(struct path *path, struct ns_common *ns) if (!dentry) return -ENOMEM; dentry->d_fsdata = (void *)ns->ops; - d = atomic_long_cmpxchg(&ns->stashed, 0, (unsigned long)dentry); - if (d) { + if (cmpxchg(&ns->stashed, NULL, dentry)) { d_delete(dentry); /* make sure ->d_prune() does nothing */ dput(dentry); cpu_relax(); diff --git a/include/linux/ns_common.h b/include/linux/ns_common.h index 0f1d024bd958..7d22ea50b098 100644 --- a/include/linux/ns_common.h +++ b/include/linux/ns_common.h @@ -7,7 +7,7 @@ struct proc_ns_operations; struct ns_common { - atomic_long_t stashed; + struct dentry *stashed; const struct proc_ns_operations *ops; unsigned int inum; refcount_t count; diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h index 49539bc416ce..5ea470eb4d76 100644 --- a/include/linux/proc_ns.h +++ b/include/linux/proc_ns.h @@ -66,7 +66,7 @@ static inline void proc_free_inum(unsigned int inum) {} static inline int ns_alloc_inum(struct ns_common *ns) { - atomic_long_set(&ns->stashed, 0); + WRITE_ONCE(ns->stashed, NULL); return proc_alloc_inum(&ns->inum); }