On Sat, Mar 30, 2019 at 12:53:57PM +0100, Jürg Billeter wrote: > On Fri, 2019-03-29 at 16:54 +0100, Christian Brauner wrote: > > diff --git a/include/uapi/linux/wait.h b/include/uapi/linux/wait.h > > index ac49a220cf2a..d6c7c0701997 100644 > > --- a/include/uapi/linux/wait.h > > +++ b/include/uapi/linux/wait.h > > @@ -18,5 +18,7 @@ > > #define P_PID 1 > > #define P_PGID 2 > > > > +/* Get a file descriptor for /proc/<pid> of the corresponding pidfd > > */ > > +#define PIDFD_GET_PROCFD _IOR('p', 1, int) > > > > #endif /* _UAPI_LINUX_WAIT_H */ > > This is missing an entry in Documentation/ioctl/ioctl-number.txt and is > actually conflicting with existing entries. Thanks. Yes, Jann mentioned this too. > > However, I'd actually prefer a syscall to allow strict whitelisting via > seccomp and avoid the other ioctl disadvantages that Daniel has already > mentioned. You can filter ioctls with seccomp. I have compromised quite a bit now and I think what we have is perfectly fine. a single clean syscalls pidfd_open() that lets you get pidfds for threads and thread-group leaders independent of procfs and a clean, simple fd->fd converstion ioctl() that is a property of the f_ops of the pidfd to get an fd to /proc/<pid> for metadata access. Btw, this being a part of the pidfd f_ops seems strikingly elegant to me. Because it expresses the notion that the metadata is implicitly part of the pidfd nicely. But I might just be dumb. I do not see the need to add another syscall that is conditional on CONFIG_PROC_FS and only does a pidfd to /proc/<pid>-fd conversion. That's almost the definition of what an ioctl() is most suited for. I get the opposition to multiplexers but consider if we where to oppose all of them. Let's leave ioctls out and just look at a few widely used multiplexer syscalls: 1. seccomp() - number of supported commands: 4 2. prctl() - number of supported commands: 45 3. keyctl() - number of supported commands: 25 4. bpf() - number of supported commands: 18 5. proposed fsconfig() - number of supported commands: 8 Total Number of required syscalls: 100 That means for bpf() alone Linux would have had to gain *18* additional single syscalls and for the new mount api only for configuring a mount context 8 additional syscalls would need to be pulled. That all hinges on the argument that "syscalls are cheap" and that running out of syscall numbers is not a real problem because there is a patchset that lifts this restriction _eventually_. That patchset hasn't been merged yet and I have not even seen it sent out yet. So we're still short of syscall numbers. _Even_ if this patchset would have landed, adding 26 syscalls for two apis seems excessive. So unless Linus jumps in here (Cced) and says that he's fine that the pidfd to /proc/<pid>-fd conversion is suited for yet another syscalls what we have here is perfectly acceptable. Again, as I've said before I don't see the point in sending piles of syscalls when it is not really justified and I find none of the arguments against this implementation we have here right now very convincing. Christian