On Mon, Mar 25, 2019 at 09:54:58PM +0000, Jonathan Kowalski wrote: > On Mon, Mar 25, 2019 at 9:43 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > On Mon, Mar 25, 2019 at 10:19:26PM +0100, Jann Horn wrote: > > > On Mon, Mar 25, 2019 at 10:11 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > > > But often you don't just want to wait for a single thing to happen; > > > you want to wait for many things at once, and react as soon as any one > > > of them happens. This is why the kernel has epoll and all the other > > > "wait for event on FD" APIs. If waiting for a process isn't possible > > > with fd-based APIs like epoll, users of this API have to spin up > > > useless helper threads. > > > > This is true. I almost forgot about the polling requirement, sorry. So then a > > new syscall it is.. About what to wait for, that can be a separate parameter > > to pidfd_wait then. > > > > This introduces a time window where the process changes state between > "get pidfd" and "setup waitfd", it would be simpler if the pidfd > itself supported .poll and on termination the exit status was made > readable from the file descriptor. It is much cleaner to add a new pidfd_wait syscall for this as discussed in the other thread. Adding .poll directly to the pidfd would seem to complicate the blocking configuration. Do we block until the task is a zombie or until it is dead? That is not possible to specify easily. Also if we need to return other types of information from the pidfd, not just exit state, then it is not clear whether blocking on a pidfd just purely on exit status makes sense. It is much cleaner to add a new pidfd_wait syscall giving it a pidfd, specify what to block on (EXIT_DEAD or EXIT_ZOMBIE or both) and then return a wait fd that can be read/blocked and returning all the needed information on unblock. > Further, in the clone4 patchset, there was a mechanism to autoreap > such a process so that it does not interfere with waiting a process > does normally. How do you intend to handle this case if anyone except > the parent is wanting to *wait* on the process (a second process, > without reaping, so as to not disrupt any waiting in the parent), and > for things like libraries that finally can manage their own set of > process when pidfd_clone is added, by excluding this process from the > process's normal wait handling logic. The pidfd_wait logic being discussed does not depend on or affects the autoreap behavior. This wait is not the traditional wait family of calls. It is for just getting notified about reading all the exit state of a task. Once I post the code, it will be clear. And I'll CC you.. thanks, - Joel