On 13/11/2024 13:09, Christian Brauner wrote: > Hm, a pidfd comes in two flavours: > > (1) thread-group leader pidfd: pidfd_open(<pid>, 0) > (2) thread pidfd: pidfd_open(<pid>, PIDFD_THREAD) > > In your current scheme fid->pid = pid_nr(pid) means that you always > encode a pidfs file handle for a thread pidfd no matter if the provided > pidfd was a thread-group leader pidfd or a thread pidfd. This is very > likely wrong as it means users that use a thread-group pidfd get a > thread-specific pid back. > > I think we need to encode (1) and (2) in the pidfs file handle so users > always get back the correct type of pidfd. > > That very likely means name_to_handle_at() needs to encode this into the > pidfs file handle. I guess a question here is whether a pidfd handle encodes a handle to a pid in a specific mode, or just to a pid in general? The thought had occurred to me while I was working on this initially, but I felt like perhaps treating it as a property of the file descriptor in general was better. Currently open_by_handle_at always returns a thread-group pidfd (since PIDFD_THREAD) isn't set, regardless of what type of pidfd you passed to name_to_handle_at. I had thought that PIDFD_THREAD/O_EXCL would have been passed through to f->f_flags on the restored pidfd, but upon checking I see that it gets filtered out in do_dentry_open. I feel like leaving it up to the caller of open_by_handle_at might be better (because they are probably better informed about whether they want poll() to inform them of thread or process exit) but I could lean either way. >> +static struct dentry *pidfs_fh_to_dentry(struct super_block *sb, >> + struct fid *gen_fid, >> + int fh_len, int fh_type) >> +{ >> + int ret; >> + struct path path; >> + struct pidfd_fid *fid = (struct pidfd_fid *)gen_fid; >> + struct pid *pid; >> + >> + if (fh_type != FILEID_INO64_GEN || fh_len < PIDFD_FID_LEN) >> + return NULL; >> + >> + pid = find_get_pid_ns(fid->pid, &init_pid_ns); >> + if (!pid || pid->ino != fid->ino || pid_vnr(pid) == 0) { >> + put_pid(pid); >> + return NULL; >> + } > I think we can avoid the premature reference bump and do: > > scoped_guard(rcu) { > struct pid *pid; > > pid = find_pid_ns(fid->pid, &init_pid_ns); > if (!pid) > return NULL; > > /* Did the pid get recycled? */ > if (pid->ino != fid->ino) > return NULL; > > /* Must be resolvable in the caller's pid namespace. */ > if (pid_vnr(pid) == 0) > return NULL; > > /* Ok, this is the pid we want. */ > get_pid(pid); > } I can go with that if preferred. I was worried a bit about making the RCU critical section too large, but of course I'm sure there are much larger sections inside the kernel. >> + >> + ret = path_from_stashed(&pid->stashed, pidfs_mnt, pid, &path); >> + if (ret < 0) >> + return ERR_PTR(ret); >> + >> + mntput(path.mnt); >> + return path.dentry; >> } Similarly here i should probably refactor this into dentry_from_stashed in order to avoid a needless bump-then-drop of path.mnt's reference count