On Sun, Mar 31, 2019 at 3:59 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Sat, Mar 30, 2019 at 9:47 PM Jann Horn <jannh@xxxxxxxxxx> wrote: > > > > Sure, given a pidfd_clone() syscall, as long as the parent of the > > process is giving you a pidfd for it and you don't have to deal with > > grandchildren created by fork() calls outside your control, that > > works. > > Don't do pidfd_clone() and pidfd_wait(). > > Both of those existing system calls already get a "flags" argument. > Just make a WPIDFD (for waitid) and CLONE_PIDFD (for clone) bit, and > make the existing system calls just take/return a pidfd. clone is out of flags, so there will have to be a new system call. I am not sure about the waitid bit. Are you suggesting it takes a pidfd and waits using it? I was thinking if we could make the pidfd itself pollable and readable for exit status. At pidfd_open time, you pass the flag and only if you're a parent you get a readable instance, if not, a pollable one for everyone (eg. for an indirect child as a reaper), and it fails for threads. Then, the pidfd clone2 returns can also be polled and read from. The main pain point is, currently when I ptrace from a thread a process, I need to use waitpid (waitid throws away ptrace critical information), and since ptrace works on a thread by thread basis, only the attached thread can do the waitpid. This means I cannot do anything else from the attached thread concurrently. waitfd was supposed to solve this (back in 2009) but it never made it in, and clone4 from Josh Triplett did something similar (returned exit status over the clonefd). FreeBSD's process descriptors are also pollable (which is where all this work was originally inspired from) and it would help with adoption if semantics were similar. Besides that, it would help libraries to be able to host their own set of children without affecting the entire process's waiting logic oe mucking with the SIGCHLD handler (you wouldn't need signals). > > Side note: we could (should?) also make the default maxpid just be > larger. It needs to fit in an 'int', but MAXINT instead of 65535 would > likely alreadt make a lot of these attacks harder. > > There was some really old legacy reason why we actually limited it to > 65535 originally. It was old and crufty even back when.. > > Linus > > Linus