On Wed, Nov 13, 2024 at 02:06:56PM +0100, Erin Shepherd wrote: > On 13/11/2024 13:09, Christian Brauner wrote: > > > Hm, a pidfd comes in two flavours: > > > > (1) thread-group leader pidfd: pidfd_open(<pid>, 0) > > (2) thread pidfd: pidfd_open(<pid>, PIDFD_THREAD) > > > > In your current scheme fid->pid = pid_nr(pid) means that you always > > encode a pidfs file handle for a thread pidfd no matter if the provided > > pidfd was a thread-group leader pidfd or a thread pidfd. This is very > > likely wrong as it means users that use a thread-group pidfd get a > > thread-specific pid back. > > > > I think we need to encode (1) and (2) in the pidfs file handle so users > > always get back the correct type of pidfd. > > > > That very likely means name_to_handle_at() needs to encode this into the > > pidfs file handle. > > I guess a question here is whether a pidfd handle encodes a handle to a pid > in a specific mode, or just to a pid in general? The thought had occurred > to me while I was working on this initially, but I felt like perhaps treating > it as a property of the file descriptor in general was better. > > Currently open_by_handle_at always returns a thread-group pidfd (since > PIDFD_THREAD) isn't set, regardless of what type of pidfd you passed to > name_to_handle_at. I had thought that PIDFD_THREAD/O_EXCL would have been I don't think you're returning a thread-groupd pidfd from open_by_handle_at() in your scheme. After all you're encoding the tid in pid_nr() so you'll always find the struct pid for the thread afaict. If I'm wrong could you please explain how you think this works? I might just be missing something obvious. > passed through to f->f_flags on the restored pidfd, but upon checking I see that > it gets filtered out in do_dentry_open. It does, but note that __pidfd_prepare() raises it explicitly on the file afterwards. So it works fine. > > I feel like leaving it up to the caller of open_by_handle_at might be better > (because they are probably better informed about whether they want poll() to > inform them of thread or process exit) but I could lean either way. So in order to decode a pidfs file handle you want the caller to have to specify O_EXCL in the flags argument of open_by_handle_at()? Is that your idea? > > >> +static struct dentry *pidfs_fh_to_dentry(struct super_block *sb, > >> + struct fid *gen_fid, > >> + int fh_len, int fh_type) > >> +{ > >> + int ret; > >> + struct path path; > >> + struct pidfd_fid *fid = (struct pidfd_fid *)gen_fid; > >> + struct pid *pid; > >> + > >> + if (fh_type != FILEID_INO64_GEN || fh_len < PIDFD_FID_LEN) > >> + return NULL; > >> + > >> + pid = find_get_pid_ns(fid->pid, &init_pid_ns); > >> + if (!pid || pid->ino != fid->ino || pid_vnr(pid) == 0) { > >> + put_pid(pid); > >> + return NULL; > >> + } > > I think we can avoid the premature reference bump and do: > > > > scoped_guard(rcu) { > > struct pid *pid; > > > > pid = find_pid_ns(fid->pid, &init_pid_ns); > > if (!pid) > > return NULL; > > > > /* Did the pid get recycled? */ > > if (pid->ino != fid->ino) > > return NULL; > > > > /* Must be resolvable in the caller's pid namespace. */ > > if (pid_vnr(pid) == 0) > > return NULL; > > > > /* Ok, this is the pid we want. */ > > get_pid(pid); > > } > > I can go with that if preferred. I was worried a bit about making the RCU > critical section too large, but of course I'm sure there are much larger > sections inside the kernel. This is perfectly fine. Don't worry about it. > > >> + > >> + ret = path_from_stashed(&pid->stashed, pidfs_mnt, pid, &path); > >> + if (ret < 0) > >> + return ERR_PTR(ret); > >> + > >> + mntput(path.mnt); > >> + return path.dentry; > >> } > > Similarly here i should probably refactor this into dentry_from_stashed in > order to avoid a needless bump-then-drop of path.mnt's reference count No, what you have now is fine. I wouldn't add a specific helper for this. In contrast to the pid the pidfs mount never goes away.