On 12/11/2024 14:10, Christian Brauner wrote: > Sorry for the delayed reply (I'm recovering from a lengthy illness.). No worries on my part, and I hope you're feeling better! > I like the idea in general. I think this is really useful. A few of my > thoughts but I need input from Amir and Jeff: > > * In the last patch of the series you already implement decoding of > pidfd file handles by adding a .fh_to_dentry export_operations method. > > There are a few things to consider because of how open_by_handle_at() > works. > > - open_by_handle_at() needs to be restricted so it only creates pidfds > from pidfs file handles that resolve to a struct pid that is > reachable in the caller's pid namespace. In other words, it should > mirror pidfd_open(). > > Put another way, open_by_handle_at() must not be usable to open > arbitrary pids to prevent a container from constructing a pidfd file > handle for a process that lives outside it's pid namespace > hierarchy. > > With this restriction in place open_by_handle_at() can be available > to let unprivileged processes open pidfd file handles. > > Related to that, I don't think we need to make open_by_handle_at() > open arbitrary pidfd file handles via CAP_DAC_READ_SEARCH. Simply > because any process in the initial pid namespace can open any other > process via pidfd_open() anyway because pid namespaces are > hierarchical. > > IOW, CAP_DAC_READ_SEARCH must not override the restriction that the > provided pidfs file handle must be reachable from the caller's pid > namespace. The pid_vnr(pid) == 0 check catches this case -- we return an error to the caller if there isn't a pid mapping in the caller's namespace Perhaps I should have called this out explicitly. > - open_by_handle_at() uses may_decode_fh() to determine whether it's > possible to decode a file handle as an unprivileged user. The > current checks don't make sense for pidfs. Conceptually, I think > there don't need to place any restrictions based on global > CAP_DAC_READ_SEARCH, owning user namespace of the superblock or > mount on pidfs file handles. > > The only restriction that matters is that the requested pidfs file > handle is reachable from the caller's pid namespace. I wonder if this could be handled through an addition to export_operations' flags member? > - A pidfd always has exactly a single inode and a single dentry. > There's no aliases. > > - Generally, in my naive opinion, I think that decoding pidfs file > handles should be a lot simpler than decoding regular path based > file handles. Because there should be no need to verify any > ancestors, or reconnect paths. Pidfs also doesn't have directory > inodes, only regular inodes. In other words, any dentry is > acceptable. > > Essentially, the only thing we need is for exportfs_decode_fh_raw() > to verify that the provided pidfs file handle is resolvable in the > caller's pid namespace. If so we're done. The challenge is how to > nicely plumb this into the code without it sticking out like a sore > thumb. Theoretically you should be able to use PIDFD_SELF as well (assuming that makes its way into mainline this release :-)) but I am a bit concerned about potentially polluting the open_by_handle_at logic with pidfd specificities. > - Pidfs should not be exportable via NFS. It doesn't make sense. Hmm, I guess I might have made that possible, though I'm certainly not familiar enough with the internals of nfsd to be able to test if I've done so. I guess probably this case calls for another export_ops flag? Not like we're short on them Thanks, - Erin