Re: [PATCH 0/4] pidfs: implement file handle support

Christian Brauner <brauner@xxxxxxxxxx> · Tue, 12 Nov 2024 14:10:06 +0100

On Fri, Nov 01, 2024 at 01:54:48PM +0000, Erin Shepherd wrote:
> Since the introduction of pidfs, we have had 64-bit process identifiers 
> that will not be reused for the entire uptime of the system. This greatly 
> facilitates process tracking in userspace.
> 
> There are two limitations at present:
> 
>  * These identifiers are currently only exposed to processes on 64-bit 
>    systems. On 32-bit systems, inode space is also limited to 32 bits and 
>    therefore is subject to the same reuse issues.
>  * There is no way to go from one of these unique identifiers to a pid or 
>    pidfd.
> 
> Patch 1 & 2 in this stack implements fh_export for pidfs. This means 
> userspace  can retrieve a unique process identifier even on 32-bit systems 
> via name_to_handle_at.
> 
> Patch 3 & 4 in this stack implement fh_to_dentry for pidfs. This means 
> userspace can convert back from a file handle to the corresponding pidfd. 
> To support us going from a file handle to a pidfd, we have to store a pid 
> inside the file handle. To ensure file handles are invariant and can move 
> between pid namespaces, we stash a pid from the initial namespace inside 
> the file handle.
> 
> I'm not quite sure if stashing an initial-namespace pid inside the file 
> handle is the right approach here; if not, I think that patch 1 & 2 are 
> useful on their own.

Sorry for the delayed reply (I'm recovering from a lengthy illness.).

I like the idea in general. I think this is really useful. A few of my
thoughts but I need input from Amir and Jeff:

* In the last patch of the series you already implement decoding of
  pidfd file handles by adding a .fh_to_dentry export_operations method.

  There are a few things to consider because of how open_by_handle_at()
  works.

  - open_by_handle_at() needs to be restricted so it only creates pidfds
    from pidfs file handles that resolve to a struct pid that is
    reachable in the caller's pid namespace. In other words, it should
    mirror pidfd_open().

    Put another way, open_by_handle_at() must not be usable to open
    arbitrary pids to prevent a container from constructing a pidfd file
    handle for a process that lives outside it's pid namespace
    hierarchy.

    With this restriction in place open_by_handle_at() can be available
    to let unprivileged processes open pidfd file handles.

    Related to that, I don't think we need to make open_by_handle_at()
    open arbitrary pidfd file handles via CAP_DAC_READ_SEARCH. Simply
    because any process in the initial pid namespace can open any other
    process via pidfd_open() anyway because pid namespaces are
    hierarchical.

    IOW, CAP_DAC_READ_SEARCH must not override the restriction that the
    provided pidfs file handle must be reachable from the caller's pid
    namespace.

  - open_by_handle_at() uses may_decode_fh() to determine whether it's
    possible to decode a file handle as an unprivileged user. The
    current checks don't make sense for pidfs. Conceptually, I think
    there don't need to place any restrictions based on global
    CAP_DAC_READ_SEARCH, owning user namespace of the superblock or
    mount on pidfs file handles.

    The only restriction that matters is that the requested pidfs file
    handle is reachable from the caller's pid namespace.

  - A pidfd always has exactly a single inode and a single dentry.
    There's no aliases.

  - Generally, in my naive opinion, I think that decoding pidfs file
    handles should be a lot simpler than decoding regular path based
    file handles. Because there should be no need to verify any
    ancestors, or reconnect paths. Pidfs also doesn't have directory
    inodes, only regular inodes. In other words, any dentry is
    acceptable.

    Essentially, the only thing we need is for exportfs_decode_fh_raw()
    to verify that the provided pidfs file handle is resolvable in the
    caller's pid namespace. If so we're done. The challenge is how to
    nicely plumb this into the code without it sticking out like a sore
    thumb.

  - Pidfs should not be exportable via NFS. It doesn't make sense.