On 12/7/23 3:58 PM, Christian Brauner wrote: > [adjusting Cc as that's really a separate topic] > > On Thu, Nov 30, 2023 at 08:43:18PM +0100, Florian Weimer wrote: >> * Mathieu Desnoyers: >> >>>>> I'd like to offer a userspace API which allows safe stashing of >>>>> unreachable file descriptors on a service thread. > > Fwiw, systemd has a concept called the fdstore: > > https://systemd.io/FILE_DESCRIPTOR_STORE > > "The file descriptor store [...] allows services to upload during > runtime additional fds to the service manager that it shall keep on its > behalf. File descriptors are passed back to the service on subsequent > activations, the same way as any socket activation fds are passed. > > [...] > > The primary use-case of this logic is to permit services to restart > seamlessly (for example to update them to a newer version), without > losing execution context, dropping pinned resources, terminating > established connections or even just momentarily losing connectivity. In > fact, as the file descriptors can be uploaded freely at any time during > the service runtime, this can even be used to implement services that > robustly handle abnormal termination and can recover from that without > losing pinned resources." > >> >>>> By "safe" here do you mean not accessible via pidfd_getfd()? >> >> No, unreachable by close/close_range/dup2/dup3. I expect we can do an >> intra-process transfer using /proc, but I'm hoping for something nicer. > > File descriptors are reachable for all processes/threads that share a > file descriptor table. Changing that means breaking core userspace > assumptions about how file descriptors work. That's not going to happen > as far as I'm concerned. > > We may consider additional security_* hooks in close*() and dup*(). That > would allow you to utilize Landlock or BPF LSM to prevent file > descriptors from being closed or duplicated. pidfd_getfd() is already > blockable via security_file_receive(). > > In general, messing with fds in that way is really not a good idea. > > If you need something that awkward, then you should go all the way and > look at io_uring which basically has a separate fd-like handle called > "fixed files". > > Fixed file indexes are separate file-descriptor like handles that can > only be used from io_uring calls but not with the regular system call > interface. > > IOW, you can refer to a file using an io_uring fixed index. The index to > use can be chosen by userspace and can't be used with any regular > fd-based system calls. > > The io_uring fd itself can be made a fixed file itself > > The only thing missing would be to turn an io_uring fixed file back into > a regular file descriptor. That could probably be done by using > receive_fd() and then installing that fd back into the caller's file > descriptor table. But that would require an io_uring patch. FWIW, since it was very trivial, I posted an rfc/test patch for just that with a test case. It's here: https://lore.kernel.org/io-uring/df0e24ff-f3a0-4818-8282-2a4e03b7b5a6@xxxxxxxxx/ -- Jens Axboe