Re: [PATCH 0/4] pidfs: implement file handle support

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 12 Nov 2024 08:57:20 -0500

On Tue, 2024-11-12 at 14:10 +0100, Christian Brauner wrote:
> On Fri, Nov 01, 2024 at 01:54:48PM +0000, Erin Shepherd wrote:
> > Since the introduction of pidfs, we have had 64-bit process identifiers 
> > that will not be reused for the entire uptime of the system. This greatly 
> > facilitates process tracking in userspace.
> > 
> > There are two limitations at present:
> > 
> >  * These identifiers are currently only exposed to processes on 64-bit 
> >    systems. On 32-bit systems, inode space is also limited to 32 bits and 
> >    therefore is subject to the same reuse issues.

We should really just move to storing 64-bit inode numbers internally
on 32-bit machines. That would at least make statx() give you all 64
bits on 32-bit host.

> >  * There is no way to go from one of these unique identifiers to a pid or 
> >    pidfd.
> > 
> > Patch 1 & 2 in this stack implements fh_export for pidfs. This means 
> > userspace  can retrieve a unique process identifier even on 32-bit systems 
> > via name_to_handle_at.
> > 
> > Patch 3 & 4 in this stack implement fh_to_dentry for pidfs. This means 
> > userspace can convert back from a file handle to the corresponding pidfd. 
> > To support us going from a file handle to a pidfd, we have to store a pid 
> > inside the file handle. To ensure file handles are invariant and can move 
> > between pid namespaces, we stash a pid from the initial namespace inside 
> > the file handle.
> > 
> > I'm not quite sure if stashing an initial-namespace pid inside the file 
> > handle is the right approach here; if not, I think that patch 1 & 2 are 
> > useful on their own.

Hmm... I guess pid namespaces don't have a convenient 64-bit ID like
mount namespaces do? In that case, stashing the pid from init_ns is
probably the next best thing.

> 
> Sorry for the delayed reply (I'm recovering from a lengthy illness.).
> 
> I like the idea in general. I think this is really useful. A few of my
> thoughts but I need input from Amir and Jeff:
> 
> * In the last patch of the series you already implement decoding of
>   pidfd file handles by adding a .fh_to_dentry export_operations method.
> 
>   There are a few things to consider because of how open_by_handle_at()
>   works.
> 
>   - open_by_handle_at() needs to be restricted so it only creates pidfds
>     from pidfs file handles that resolve to a struct pid that is
>     reachable in the caller's pid namespace. In other words, it should
>     mirror pidfd_open().
> 
>     Put another way, open_by_handle_at() must not be usable to open
>     arbitrary pids to prevent a container from constructing a pidfd file
>     handle for a process that lives outside it's pid namespace
>     hierarchy.
> 
>     With this restriction in place open_by_handle_at() can be available
>     to let unprivileged processes open pidfd file handles.
> 
>     Related to that, I don't think we need to make open_by_handle_at()
>     open arbitrary pidfd file handles via CAP_DAC_READ_SEARCH. Simply
>     because any process in the initial pid namespace can open any other
>     process via pidfd_open() anyway because pid namespaces are
>     hierarchical.
> 
>     IOW, CAP_DAC_READ_SEARCH must not override the restriction that the
>     provided pidfs file handle must be reachable from the caller's pid
>     namespace.
> 
>   - open_by_handle_at() uses may_decode_fh() to determine whether it's
>     possible to decode a file handle as an unprivileged user. The
>     current checks don't make sense for pidfs. Conceptually, I think
>     there don't need to place any restrictions based on global
>     CAP_DAC_READ_SEARCH, owning user namespace of the superblock or
>     mount on pidfs file handles.
> 
>     The only restriction that matters is that the requested pidfs file
>     handle is reachable from the caller's pid namespace.
>
>   - A pidfd always has exactly a single inode and a single dentry.
>     There's no aliases.
> 
>   - Generally, in my naive opinion, I think that decoding pidfs file
>     handles should be a lot simpler than decoding regular path based
>     file handles. Because there should be no need to verify any
>     ancestors, or reconnect paths. Pidfs also doesn't have directory
>     inodes, only regular inodes. In other words, any dentry is
>     acceptable.
> 
>     Essentially, the only thing we need is for exportfs_decode_fh_raw()
>     to verify that the provided pidfs file handle is resolvable in the
>     caller's pid namespace. If so we're done. The challenge is how to
>     nicely plumb this into the code without it sticking out like a sore
>     thumb.
> 
>   - Pidfs should not be exportable via NFS. It doesn't make sense.

I haven't looked over the patchset yet, but those restrictions all
sound pretty reasonable to me. Special casing the may_decode_fh
permission checks may be the tricky bit. I'm not sure what that should
look like, tbqh.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>