On Wed, Nov 13, 2024 at 12:03:08AM +0100, Erin Shepherd wrote: > > On 12/11/2024 14:10, Christian Brauner wrote: > > Sorry for the delayed reply (I'm recovering from a lengthy illness.). > No worries on my part, and I hope you're feeling better! > > I like the idea in general. I think this is really useful. A few of my > > thoughts but I need input from Amir and Jeff: > > > > * In the last patch of the series you already implement decoding of > > pidfd file handles by adding a .fh_to_dentry export_operations method. > > > > There are a few things to consider because of how open_by_handle_at() > > works. > > > > - open_by_handle_at() needs to be restricted so it only creates pidfds > > from pidfs file handles that resolve to a struct pid that is > > reachable in the caller's pid namespace. In other words, it should > > mirror pidfd_open(). > > > > Put another way, open_by_handle_at() must not be usable to open > > arbitrary pids to prevent a container from constructing a pidfd file > > handle for a process that lives outside it's pid namespace > > hierarchy. > > > > With this restriction in place open_by_handle_at() can be available > > to let unprivileged processes open pidfd file handles. > > > > Related to that, I don't think we need to make open_by_handle_at() > > open arbitrary pidfd file handles via CAP_DAC_READ_SEARCH. Simply > > because any process in the initial pid namespace can open any other > > process via pidfd_open() anyway because pid namespaces are > > hierarchical. > > > > IOW, CAP_DAC_READ_SEARCH must not override the restriction that the > > provided pidfs file handle must be reachable from the caller's pid > > namespace. > > The pid_vnr(pid) == 0 check catches this case -- we return an error to the > caller if there isn't a pid mapping in the caller's namespace > > Perhaps I should have called this out explicitly. > > > - open_by_handle_at() uses may_decode_fh() to determine whether it's > > possible to decode a file handle as an unprivileged user. The > > current checks don't make sense for pidfs. Conceptually, I think > > there don't need to place any restrictions based on global > > CAP_DAC_READ_SEARCH, owning user namespace of the superblock or > > mount on pidfs file handles. > > > > The only restriction that matters is that the requested pidfs file > > handle is reachable from the caller's pid namespace. > > I wonder if this could be handled through an addition to export_operations' > flags member? > > > - A pidfd always has exactly a single inode and a single dentry. > > There's no aliases. > > > > - Generally, in my naive opinion, I think that decoding pidfs file > > handles should be a lot simpler than decoding regular path based > > file handles. Because there should be no need to verify any > > ancestors, or reconnect paths. Pidfs also doesn't have directory > > inodes, only regular inodes. In other words, any dentry is > > acceptable. > > > > Essentially, the only thing we need is for exportfs_decode_fh_raw() > > to verify that the provided pidfs file handle is resolvable in the > > caller's pid namespace. If so we're done. The challenge is how to > > nicely plumb this into the code without it sticking out like a sore > > thumb. > > Theoretically you should be able to use PIDFD_SELF as well (assuming that > makes its way into mainline this release :-)) but I am a bit concerned about > potentially polluting the open_by_handle_at logic with pidfd specificities. > > > - Pidfs should not be exportable via NFS. It doesn't make sense. > > Hmm, I guess I might have made that possible, though I'm certainly not > familiar enough with the internals of nfsd to be able to test if I've done > so. AFAIK check_export() in fs/nfsd/export.c spells this it out: /* There are two requirements on a filesystem to be exportable. * 1: We must be able to identify the filesystem from a number. * either a device number (so FS_REQUIRES_DEV needed) * or an FSID number (so NFSEXP_FSID or ->uuid is needed). * 2: We must be able to find an inode from a filehandle. * This means that s_export_op must be set. * 3: We must not currently be on an idmapped mount. */ Granted I've been wrong on account of stale docs before. :$ Though it would be kinda funny if you *could* mess with another machine's processes over NFS. --D > I guess probably this case calls for another export_ops flag? Not like we're > short on them > > Thanks, > - Erin > >