On Sat, Feb 17, 2024 at 09:30:19AM -0800, Linus Torvalds wrote: > On Sat, 17 Feb 2024 at 06:00, Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > > > But I have a really stupid (I know nothing about vfs) question, why do we > > need pidfdfs_ino and pid->ino ? Can you explain why pidfdfs_alloc_file() > > can't simply use, say, iget_locked(pidfdfs_sb, (unsigned long)pid) ? > > > > IIUC, if this pid is freed and then another "struct pid" has the same address > > we can rely on __wait_on_freeing_inode() ? > > Heh. Maybe it would work, but we really don't want to expose core > kernel pointers to user space as the inode number. And then also the property that the inode number is unique for the system lifetime is extremely useful for userspace and I would like to retain that property. > > So then we'd have to add extra hackery to that (ie we'd have to > intercept stat calls, and we'd have to have something else for > ->d_dname() etc..). > > Those are all things that the VFS does support, but ... > > So I do prefer Christian's new approach, although some of it ends up > being a bit unclear. > > Christian, can you explain why this: > > spin_lock(&alias->d_lock); > dget_dlock(alias); > spin_unlock(&alias->d_lock); > > instead of just 'dget()'? No reason other than I forgot to switch to dget(). > > Also, while I found the old __ns_get_path() to be fairly disgusting, I > actually think it's simpler and clearer than playing around with the > dentry alias list. So my expectation on code sharing was that you'd It's overall probably also cheaper, I think. > basically lift the old __ns_get_path(), make *that* the helper, and > just pass it an argument that is the pointer to the filesystem > "stashed" entry... > > And yes, using "atomic_long_t" for stashed is a crime against > humanity. It's also entirely pointless. There are no actual atomic > operations that the code wants except for reading and writing (aka > READ_ONCE() and WRITE_ONCE()) and cmpxchg (aka just cmpxchg()). Using > "atomic_long_t" buys the code nothing, and only makes things more > complicated and requires crazy casts. Yup, I had that as a draft and that introduced struct ino_stash which contained a dentry pointer and the inode number using cmpxchg(). But I decided against this because ns_common.h would require to have access to ino_stash definition so we wouldn't just able to hide it in internal.h where it should belong. > > So I think the nsfs.c code should be changed to do > > - atomic_long_t stashed; > + struct dentry *stashed; > > and remove the crazy workarounds for using the wrong type. > > Something like the attached patch. > > Then, I think the whole "lockref_get_not_dead()" etc games of > __ns_get_path() could be extracted out into a helper function that > takes that "&ns->stashed" pointer as an argument, and now that helper > could also be used for pidfs, except pidfs obviously just does > "&pid->stashed" instead. > > Hmm? > > Entirely untested patch. Also, the above idea may be broken because of > some obvious issue that I didn't think about. Christian? Yeah, sure. Works for me. Let me play with something.