On Sat, Mar 08, 2025 at 12:10:12AM +0000, Pratyush Yadav wrote: > Hi Christian, > > Thanks for the review! No worries, I'm not trying to be polemic. It's just that this whole proposed concept is pretty lightweight in terms of thinking about possible implications. > > This use-case is covered with systemd's fdstore and it's available to > > unprivileged userspace. Stashing arbitrary file descriptors in the > > kernel in this way isn't a good idea. > > For one, it can't be arbitrary FDs, but only explicitly enabled ones. > Beyond that, while not intended, there is no way to stop userspace from > using it as a stash. Stashing FDs is a needed operation for this to > work, and there is no way to guarantee in advance that userspace will > actually use it for KHO, and not just stash it to grab back later. As written it can't ever function as a generic file descriptor store. It only allows fully privileged processes to stash file descriptors. Which makes it useless for generic userspace. A generic fdstore should have a model that makes it usable unprivileged it probably should also be multi-instance and work easily with namespaces. This doesn't and hitching it on devtmpfs and character devices is guaranteed to not work well with such use-cases. It also has big time security issues and implications. Any file you stash in there will have the credentials of the opener attached to it. So if someone stashes anything in there you need permission mechanisms that ensures that Joe Random can't via FDBOX_GET_FD pull out a file for e.g., someone else's cgroup and happily migrate processses under the openers credentials or mess around some random executing binary. So you need a model of who is allowed to pull out what file descriptors from a file descriptor stash. What are the semantics for that? What's the security model for that? What are possible corner cases? For systemd's userspace fstore that's covered by policy it can implement quite easily what fds it accepts. For the kernel it's a lot more complicated. If someone puts in file descriptors for a bunch of files in there opened in different mount namespaces then this will pin said mount namespaces. If the last process in the mount namespace exists the mount namespace would be cleaned up but not anymore. The mount namespace would stay pinned. Not wrong, but needs to be spelled out what the implications of this are. What if someone puts a file descriptor from devtmpfs or for /dev/fdbox into an fdbox? Even if that's blocked, what happens if someone creates a detached bind-mount of a /dev/fdbox mount and mounts it into a different mount namespace and then puts a file descriptor for that mount namespace into the fdbox? Tons of other scenarios come to mind. Ignoring when networking is brought into the mix as well. It's not done by just letting the kernel stash some files and getting them out later somehow and then see whether it's somehow useful in the future for other stuff. A generic globally usable fdstore is not happening without a clear and detailed analysis what the semantics are going to be. So either that work is done right from the start or that stashing files goes out the window and instead that KHO part is implemented in a way where during a KHO dump relevant userspace is notified that they must now serialize their state into the serialization stash. And no files are actually kept in there at all.