, On Sun, Dec 22, 2019 at 4:48 AM Christian Brauner <christian.brauner@xxxxxxxxxx> wrote: > > On Fri, Dec 20, 2019 at 11:28:13PM +0000, Sargun Dhillon wrote: > > This syscall allows for the retrieval of file descriptors from other > > processes, based on their pidfd. This is possible using ptrace, and > > injection of parasitic code along with using SCM_RIGHTS to move > > file descriptors between a tracee and a tracer. Unfortunately, ptrace > > comes with a high cost of requiring the process to be stopped, and > > breaks debuggers. This does not require stopping the process under > > manipulation. > > > > One reason to use this is to allow sandboxers to take actions on file > > descriptors on the behalf of another process. For example, this can be > > combined with seccomp-bpf's user notification to do on-demand fd > > extraction and take privileged actions. For example, it can be used > > to bind a socket to a privileged port. > > > > /* prototype */ > > /* > > * pidfd_getfd_options is an extensible struct which can have options > > * added to it. If options is NULL, size, and it will be ignored be > > * ignored, otherwise, size should be set to sizeof(*options). If > > * option is newer than the current kernel version, E2BIG will be > > * returned. > > */ > > struct pidfd_getfd_options {}; > > long pidfd_getfd(int pidfd, int fd, unsigned int flags, > > struct pidfd_getfd_options *options, size_t size); That's embarrassing. This was supposed to read: long pidfd_getfd(int pidfd, int fd, struct pidfd_get_options *options, size_t size); > > The prototype advertises a flags argument but the actual > > +SYSCALL_DEFINE4(pidfd_getfd, int, pidfd, int, fd, > + struct pidfd_getfd_options __user *, options, size_t, usize) > > does not have a flags argument... > > I think having a flags argument makes a lot of sense. > > I'm not sure what to think about the struct. I agree with Aleksa that > having an empty struct is not a great idea. From a design perspective it > seems very out of place. If we do a struct at all putting at least a > single reserved field in there might makes more sense. > > In general, I think we need to have a _concrete_ reason why putting a > struct versioned by size as arguments for this syscall. > That means we need to have at least a concrete example for a new feature > for this syscall where a flag would not convey enough information. I can think of at least two reasons we need flags: * Clearing cgroup flags * Closing the process under manipulation's FD when we fetch it. The original reason for wanting to have two places where we can put flags was to have a different field for fd flags vs. call flags. I'm not sure there's any flags you'd want to set. Given this, if we want to go down the route of a syscall, we should just leave it as a __u64 flags, and drop the pointer to the struct, if we're not worried about that. > > And I'm not sure that there is a good one... I guess one thing I can > think of is that a caller might want dup-like semantics, i.e. a caller > might want to say: > > pidfd_getfd(<pidfd>, <fd-to-get>, <fd-number-to-want>, <flags>, ...) > > such that after pidfd_getfd() returns <fd-to-get> corresponds to > <fd-number-to-want> in the caller. But that can also be achieved via: > int fd = pidfd_getfd(<pidfd>, <fd-to-get>, <flags>, ...) > int final_fd = dup3(fd, <newfd>, O_CLOEXEC) > > > > > /* testing */ > > Ran self-test suite on x86_64 > > +1 >