On 14/11/2024 08:07, Amir Goldstein wrote: > On Wed, Nov 13, 2024 at 7:11 PM Erin Shepherd <erin.shepherd@xxxxxx> wrote: >> On 64-bit platforms, userspace can read the pidfd's inode in order to >> get a never-repeated PID identifier. On 32-bit platforms this identifier >> is not exposed, as inodes are limited to 32 bits. Instead expose the >> identifier via export_fh, which makes it available to userspace via >> name_to_handle_at >> >> In addition we implement fh_to_dentry, which allows userspace to >> recover a pidfd from a PID file handle. > "In addition" is a good indication that a separate patch was a good idea.. > >> We stash the process' PID in the root pid namespace inside the handle, >> and use that to recover the pid (validating that pid->ino matches the >> value in the handle, i.e. that the pid has not been reused). >> >> We use the root namespace in order to ensure that file handles can be >> moved across namespaces; however, we validate that the PID exists in >> the current namespace before returning the inode. >> >> Reviewed-by: Amir Goldstein <amir73il@xxxxxxxxx> > This patch has changed enough that you should not have kept my RVB. > > BTW, please refrain from using the same subject for the cover letter and > a single patch. They are not the same thing, so if they get the same > name, one of them has an inaccurate description. > ACK to all three. >> Signed-off-by: Erin Shepherd <erin.shepherd@xxxxxx> >> --- >> fs/pidfs.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 61 insertions(+), 1 deletion(-) >> >> diff --git a/fs/pidfs.c b/fs/pidfs.c >> index 80675b6bf88459c22787edaa68db360bdc0d0782..0684a9b8fe71c5205fb153b2714bc9c672045fd5 100644 >> --- a/fs/pidfs.c >> +++ b/fs/pidfs.c >> @@ -1,5 +1,6 @@ >> // SPDX-License-Identifier: GPL-2.0 >> #include <linux/anon_inodes.h> >> +#include <linux/exportfs.h> >> #include <linux/file.h> >> #include <linux/fs.h> >> #include <linux/magic.h> >> @@ -347,11 +348,69 @@ static const struct dentry_operations pidfs_dentry_operations = { >> .d_prune = stashed_dentry_prune, >> }; >> >> +#define PIDFD_FID_LEN 3 >> + >> +struct pidfd_fid { >> + u64 ino; >> + s32 pid; >> +} __packed; >> + >> +static int pidfs_encode_fh(struct inode *inode, u32 *fh, int *max_len, >> + struct inode *parent) >> +{ >> + struct pid *pid = inode->i_private; >> + struct pidfd_fid *fid = (struct pidfd_fid *)fh; >> + >> + if (*max_len < PIDFD_FID_LEN) { >> + *max_len = PIDFD_FID_LEN; >> + return FILEID_INVALID; >> + } >> + >> + fid->ino = pid->ino; >> + fid->pid = pid_nr(pid); >> + *max_len = PIDFD_FID_LEN; >> + return FILEID_INO64_GEN; >> +} >> + >> +static struct dentry *pidfs_fh_to_dentry(struct super_block *sb, >> + struct fid *gen_fid, >> + int fh_len, int fh_type) >> +{ >> + int ret; >> + struct path path; >> + struct pidfd_fid *fid = (struct pidfd_fid *)gen_fid; >> + struct pid *pid; >> + >> + if (fh_type != FILEID_INO64_GEN || fh_len < PIDFD_FID_LEN) >> + return NULL; >> + >> + scoped_guard(rcu) { >> + pid = find_pid_ns(fid->pid, &init_pid_ns); >> + if (!pid || pid->ino != fid->ino || pid_vnr(pid) == 0) >> + return NULL; >> + >> + pid = get_pid(pid); >> + } >> + >> + ret = path_from_stashed(&pid->stashed, pidfs_mnt, pid, &path); >> + if (ret < 0) >> + return ERR_PTR(ret); > How come no need to put_pid() in case of error? This one confused me at first too, but path_from_stashed frees it (via stashed_ops.put_data) on error. You can see the same pattern in pidfs_alloc_file. (It already needs to know how to free it for the case where a stashed dentry already exists) >> + >> + mntput(path.mnt); >> + return path.dentry; >> +} >> + >> +static const struct export_operations pidfs_export_operations = { >> + .encode_fh = pidfs_encode_fh, >> + .fh_to_dentry = pidfs_fh_to_dentry, >> + .flags = EXPORT_OP_UNRESTRICTED_OPEN, >> +}; >> + >> static int pidfs_init_inode(struct inode *inode, void *data) >> { >> inode->i_private = data; >> inode->i_flags |= S_PRIVATE; >> - inode->i_mode |= S_IRWXU; >> + inode->i_mode |= S_IRWXU | S_IRWXG | S_IRWXO; > This change is not explained. > Why is it here? open_by_handle_at eventually passes through the may_open permission check. The existing permissions only permits root to open them (since the owning uid & gid is 0). So, it was necessary to widen them to align with how pidfd_open works. If I stick with this approach (see [1]) I'll ensure to document this change better in the commit message. [1] https://lore.kernel.org/all/6a3ed633-311d-47ff-8a7e-5121d6186139@xxxxxx/