On Fri, Nov 01, 2024 at 01:54:48PM +0000, Erin Shepherd wrote: > Since the introduction of pidfs, we have had 64-bit process identifiers > that will not be reused for the entire uptime of the system. This greatly > facilitates process tracking in userspace. > > There are two limitations at present: > > * These identifiers are currently only exposed to processes on 64-bit > systems. On 32-bit systems, inode space is also limited to 32 bits and > therefore is subject to the same reuse issues. > * There is no way to go from one of these unique identifiers to a pid or > pidfd. > > Patch 1 & 2 in this stack implements fh_export for pidfs. This means > userspace can retrieve a unique process identifier even on 32-bit systems > via name_to_handle_at. > > Patch 3 & 4 in this stack implement fh_to_dentry for pidfs. This means > userspace can convert back from a file handle to the corresponding pidfd. > To support us going from a file handle to a pidfd, we have to store a pid > inside the file handle. To ensure file handles are invariant and can move > between pid namespaces, we stash a pid from the initial namespace inside > the file handle. > > I'm not quite sure if stashing an initial-namespace pid inside the file > handle is the right approach here; if not, I think that patch 1 & 2 are > useful on their own. Sorry for the delayed reply (I'm recovering from a lengthy illness.). I like the idea in general. I think this is really useful. A few of my thoughts but I need input from Amir and Jeff: * In the last patch of the series you already implement decoding of pidfd file handles by adding a .fh_to_dentry export_operations method. There are a few things to consider because of how open_by_handle_at() works. - open_by_handle_at() needs to be restricted so it only creates pidfds from pidfs file handles that resolve to a struct pid that is reachable in the caller's pid namespace. In other words, it should mirror pidfd_open(). Put another way, open_by_handle_at() must not be usable to open arbitrary pids to prevent a container from constructing a pidfd file handle for a process that lives outside it's pid namespace hierarchy. With this restriction in place open_by_handle_at() can be available to let unprivileged processes open pidfd file handles. Related to that, I don't think we need to make open_by_handle_at() open arbitrary pidfd file handles via CAP_DAC_READ_SEARCH. Simply because any process in the initial pid namespace can open any other process via pidfd_open() anyway because pid namespaces are hierarchical. IOW, CAP_DAC_READ_SEARCH must not override the restriction that the provided pidfs file handle must be reachable from the caller's pid namespace. - open_by_handle_at() uses may_decode_fh() to determine whether it's possible to decode a file handle as an unprivileged user. The current checks don't make sense for pidfs. Conceptually, I think there don't need to place any restrictions based on global CAP_DAC_READ_SEARCH, owning user namespace of the superblock or mount on pidfs file handles. The only restriction that matters is that the requested pidfs file handle is reachable from the caller's pid namespace. - A pidfd always has exactly a single inode and a single dentry. There's no aliases. - Generally, in my naive opinion, I think that decoding pidfs file handles should be a lot simpler than decoding regular path based file handles. Because there should be no need to verify any ancestors, or reconnect paths. Pidfs also doesn't have directory inodes, only regular inodes. In other words, any dentry is acceptable. Essentially, the only thing we need is for exportfs_decode_fh_raw() to verify that the provided pidfs file handle is resolvable in the caller's pid namespace. If so we're done. The challenge is how to nicely plumb this into the code without it sticking out like a sore thumb. - Pidfs should not be exportable via NFS. It doesn't make sense.