Re: [PATCH 0/4] pidfs: implement file handle support

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Tue, 12 Nov 2024 16:40:11 -0800

On Wed, Nov 13, 2024 at 12:03:08AM +0100, Erin Shepherd wrote:
> 
> On 12/11/2024 14:10, Christian Brauner wrote:
> > Sorry for the delayed reply (I'm recovering from a lengthy illness.).
> No worries on my part, and I hope you're feeling better!
> > I like the idea in general. I think this is really useful. A few of my
> > thoughts but I need input from Amir and Jeff:
> >
> > * In the last patch of the series you already implement decoding of
> >   pidfd file handles by adding a .fh_to_dentry export_operations method.
> >
> >   There are a few things to consider because of how open_by_handle_at()
> >   works.
> >
> >   - open_by_handle_at() needs to be restricted so it only creates pidfds
> >     from pidfs file handles that resolve to a struct pid that is
> >     reachable in the caller's pid namespace. In other words, it should
> >     mirror pidfd_open().
> >
> >     Put another way, open_by_handle_at() must not be usable to open
> >     arbitrary pids to prevent a container from constructing a pidfd file
> >     handle for a process that lives outside it's pid namespace
> >     hierarchy.
> >
> >     With this restriction in place open_by_handle_at() can be available
> >     to let unprivileged processes open pidfd file handles.
> >
> >     Related to that, I don't think we need to make open_by_handle_at()
> >     open arbitrary pidfd file handles via CAP_DAC_READ_SEARCH. Simply
> >     because any process in the initial pid namespace can open any other
> >     process via pidfd_open() anyway because pid namespaces are
> >     hierarchical.
> >
> >     IOW, CAP_DAC_READ_SEARCH must not override the restriction that the
> >     provided pidfs file handle must be reachable from the caller's pid
> >     namespace.
> 
> The pid_vnr(pid) == 0 check catches this case -- we return an error to the
> caller if there isn't a pid mapping in the caller's namespace
> 
> Perhaps I should have called this out explicitly.
> 
> >   - open_by_handle_at() uses may_decode_fh() to determine whether it's
> >     possible to decode a file handle as an unprivileged user. The
> >     current checks don't make sense for pidfs. Conceptually, I think
> >     there don't need to place any restrictions based on global
> >     CAP_DAC_READ_SEARCH, owning user namespace of the superblock or
> >     mount on pidfs file handles.
> >
> >     The only restriction that matters is that the requested pidfs file
> >     handle is reachable from the caller's pid namespace.
> 
> I wonder if this could be handled through an addition to export_operations'
> flags member?
> 
> >   - A pidfd always has exactly a single inode and a single dentry.
> >     There's no aliases.
> >
> >   - Generally, in my naive opinion, I think that decoding pidfs file
> >     handles should be a lot simpler than decoding regular path based
> >     file handles. Because there should be no need to verify any
> >     ancestors, or reconnect paths. Pidfs also doesn't have directory
> >     inodes, only regular inodes. In other words, any dentry is
> >     acceptable.
> >
> >     Essentially, the only thing we need is for exportfs_decode_fh_raw()
> >     to verify that the provided pidfs file handle is resolvable in the
> >     caller's pid namespace. If so we're done. The challenge is how to
> >     nicely plumb this into the code without it sticking out like a sore
> >     thumb.
> 
> Theoretically you should be able to use PIDFD_SELF as well (assuming that
> makes its way into mainline this release :-)) but I am a bit concerned about
> potentially polluting the open_by_handle_at logic with pidfd specificities.
> 
> >   - Pidfs should not be exportable via NFS. It doesn't make sense.
> 
> Hmm, I guess I might have made that possible, though I'm certainly not
> familiar enough with the internals of nfsd to be able to test if I've done
> so.

AFAIK check_export() in fs/nfsd/export.c spells this it out:

	/* There are two requirements on a filesystem to be exportable.
	 * 1:  We must be able to identify the filesystem from a number.
	 *       either a device number (so FS_REQUIRES_DEV needed)
	 *       or an FSID number (so NFSEXP_FSID or ->uuid is needed).
	 * 2:  We must be able to find an inode from a filehandle.
	 *       This means that s_export_op must be set.
	 * 3: We must not currently be on an idmapped mount.
	 */

Granted I've been wrong on account of stale docs before. :$

Though it would be kinda funny if you *could* mess with another
machine's processes over NFS.

--D

> I guess probably this case calls for another export_ops flag? Not like we're
> short on them
> 
> Thanks,
>     - Erin
> 
>