On Sun, Feb 18, 2024 at 12:15:02PM +0100, Christian Brauner wrote: > On Sat, Feb 17, 2024 at 09:30:19AM -0800, Linus Torvalds wrote: > > On Sat, 17 Feb 2024 at 06:00, Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > > > > > But I have a really stupid (I know nothing about vfs) question, why do we > > > need pidfdfs_ino and pid->ino ? Can you explain why pidfdfs_alloc_file() > > > can't simply use, say, iget_locked(pidfdfs_sb, (unsigned long)pid) ? > > > > > > IIUC, if this pid is freed and then another "struct pid" has the same address > > > we can rely on __wait_on_freeing_inode() ? > > > > Heh. Maybe it would work, but we really don't want to expose core > > kernel pointers to user space as the inode number. > > And then also the property that the inode number is unique for the > system lifetime is extremely useful for userspace and I would like to > retain that property. > > > > > So then we'd have to add extra hackery to that (ie we'd have to > > intercept stat calls, and we'd have to have something else for > > ->d_dname() etc..). > > > > Those are all things that the VFS does support, but ... > > > > So I do prefer Christian's new approach, although some of it ends up > > being a bit unclear. > > > > Christian, can you explain why this: > > > > spin_lock(&alias->d_lock); > > dget_dlock(alias); > > spin_unlock(&alias->d_lock); > > > > instead of just 'dget()'? > > No reason other than I forgot to switch to dget(). > > > > > Also, while I found the old __ns_get_path() to be fairly disgusting, I > > actually think it's simpler and clearer than playing around with the > > dentry alias list. So my expectation on code sharing was that you'd > > It's overall probably also cheaper, I think. > > > basically lift the old __ns_get_path(), make *that* the helper, and > > just pass it an argument that is the pointer to the filesystem > > "stashed" entry... > > > > And yes, using "atomic_long_t" for stashed is a crime against > > humanity. It's also entirely pointless. There are no actual atomic > > operations that the code wants except for reading and writing (aka > > READ_ONCE() and WRITE_ONCE()) and cmpxchg (aka just cmpxchg()). Using > > "atomic_long_t" buys the code nothing, and only makes things more > > complicated and requires crazy casts. > > Yup, I had that as a draft and that introduced struct ino_stash which > contained a dentry pointer and the inode number using cmpxchg(). But I > decided against this because ns_common.h would require to have access to > ino_stash definition so we wouldn't just able to hide it in internal.h > where it should belong. Right, I remember. The annoying thing will be how to cleanly handle this without having to pass too many parameters because we need d_fsdata, the vfsmount, and the inode->i_fop. So let me see if I can get this to something that doesn't look too ugly.