Re: [PATCH v2 3/3] pidfs: implement file handle support

Erin Shepherd <erin.shepherd@xxxxxx> · Thu, 14 Nov 2024 13:42:26 +0100

On 14/11/2024 08:07, Amir Goldstein wrote:
> On Wed, Nov 13, 2024 at 7:11 PM Erin Shepherd <erin.shepherd@xxxxxx> wrote:
>> On 64-bit platforms, userspace can read the pidfd's inode in order to
>> get a never-repeated PID identifier. On 32-bit platforms this identifier
>> is not exposed, as inodes are limited to 32 bits. Instead expose the
>> identifier via export_fh, which makes it available to userspace via
>> name_to_handle_at
>>
>> In addition we implement fh_to_dentry, which allows userspace to
>> recover a pidfd from a PID file handle.
> "In addition" is a good indication that a separate patch was a good idea..
>
>> We stash the process' PID in the root pid namespace inside the handle,
>> and use that to recover the pid (validating that pid->ino matches the
>> value in the handle, i.e. that the pid has not been reused).
>>
>> We use the root namespace in order to ensure that file handles can be
>> moved across namespaces; however, we validate that the PID exists in
>> the current namespace before returning the inode.
>>
>> Reviewed-by: Amir Goldstein <amir73il@xxxxxxxxx>
> This patch has changed enough that you should not have kept my RVB.
>
> BTW, please refrain from using the same subject for the cover letter and
> a single patch. They are not the same thing, so if they get the same
> name, one of them has an inaccurate description.
>
ACK to all three.

>> Signed-off-by: Erin Shepherd <erin.shepherd@xxxxxx>
>> ---
>>  fs/pidfs.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 61 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/pidfs.c b/fs/pidfs.c
>> index 80675b6bf88459c22787edaa68db360bdc0d0782..0684a9b8fe71c5205fb153b2714bc9c672045fd5 100644
>> --- a/fs/pidfs.c
>> +++ b/fs/pidfs.c
>> @@ -1,5 +1,6 @@
>>  // SPDX-License-Identifier: GPL-2.0
>>  #include <linux/anon_inodes.h>
>> +#include <linux/exportfs.h>
>>  #include <linux/file.h>
>>  #include <linux/fs.h>
>>  #include <linux/magic.h>
>> @@ -347,11 +348,69 @@ static const struct dentry_operations pidfs_dentry_operations = {
>>         .d_prune        = stashed_dentry_prune,
>>  };
>>
>> +#define PIDFD_FID_LEN 3
>> +
>> +struct pidfd_fid {
>> +       u64 ino;
>> +       s32 pid;
>> +} __packed;
>> +
>> +static int pidfs_encode_fh(struct inode *inode, u32 *fh, int *max_len,
>> +                          struct inode *parent)
>> +{
>> +       struct pid *pid = inode->i_private;
>> +       struct pidfd_fid *fid = (struct pidfd_fid *)fh;
>> +
>> +       if (*max_len < PIDFD_FID_LEN) {
>> +               *max_len = PIDFD_FID_LEN;
>> +               return FILEID_INVALID;
>> +       }
>> +
>> +       fid->ino = pid->ino;
>> +       fid->pid = pid_nr(pid);
>> +       *max_len = PIDFD_FID_LEN;
>> +       return FILEID_INO64_GEN;
>> +}
>> +
>> +static struct dentry *pidfs_fh_to_dentry(struct super_block *sb,
>> +                                        struct fid *gen_fid,
>> +                                        int fh_len, int fh_type)
>> +{
>> +       int ret;
>> +       struct path path;
>> +       struct pidfd_fid *fid = (struct pidfd_fid *)gen_fid;
>> +       struct pid *pid;
>> +
>> +       if (fh_type != FILEID_INO64_GEN || fh_len < PIDFD_FID_LEN)
>> +               return NULL;
>> +
>> +       scoped_guard(rcu) {
>> +               pid = find_pid_ns(fid->pid, &init_pid_ns);
>> +               if (!pid || pid->ino != fid->ino || pid_vnr(pid) == 0)
>> +                       return NULL;
>> +
>> +               pid = get_pid(pid);
>> +       }
>> +
>> +       ret = path_from_stashed(&pid->stashed, pidfs_mnt, pid, &path);
>> +       if (ret < 0)
>> +               return ERR_PTR(ret);
> How come no need to put_pid() in case of error?

This one confused me at first too, but path_from_stashed frees it (via
stashed_ops.put_data) on error. You can see the same pattern in
pidfs_alloc_file.

(It already needs to know how to free it for the case where a stashed
dentry already exists)

>> +
>> +       mntput(path.mnt);
>> +       return path.dentry;
>> +}
>> +
>> +static const struct export_operations pidfs_export_operations = {
>> +       .encode_fh = pidfs_encode_fh,
>> +       .fh_to_dentry = pidfs_fh_to_dentry,
>> +       .flags = EXPORT_OP_UNRESTRICTED_OPEN,
>> +};
>> +
>>  static int pidfs_init_inode(struct inode *inode, void *data)
>>  {
>>         inode->i_private = data;
>>         inode->i_flags |= S_PRIVATE;
>> -       inode->i_mode |= S_IRWXU;
>> +       inode->i_mode |= S_IRWXU | S_IRWXG | S_IRWXO;
> This change is not explained.
> Why is it here?

open_by_handle_at eventually passes through the may_open permission check.
The existing permissions only permits root to open them (since the owning
uid & gid is 0). So, it was necessary to widen them to align with how
pidfd_open works.

If I stick with this approach (see [1]) I'll ensure to document this change
better in the commit message.

[1] https://lore.kernel.org/all/6a3ed633-311d-47ff-8a7e-5121d6186139@xxxxxx/