On Sat, Mar 14, 2015 at 08:47:21PM +0100, Oleg Nesterov wrote: > On 03/14, Oleg Nesterov wrote: > > > > On 03/14, Josh Triplett wrote: > > > > > > On Sat, Mar 14, 2015 at 11:38:29AM -0700, Thiago Macieira wrote: > > > > On Saturday 14 March 2015 15:32:35 Oleg Nesterov wrote: > > > > > It is not clear to me what do_wait() should do with ->autoreap child, even > > > > > ignoring ptrace. > > > > > > > > > > Just suppose that real_parent has a single "autoreap" child. Should > > > > > wait(NULL) hanf then? > > > > > > > > It should ignore the child that is set to autoreap. wait(NULL) should return - > > > > ECHILD, indicating there are no children waiting to be reaped. > > > > > > Right. And I don't think the current code does this. I think we need > > > to change wait_consider_task to early-return for ->autoreap just as it > > > does for task_state == EXIT_DEAD. > > > > No. This EXIT_DEAD is absolutely different. And this is another indication > > that you might use it wrongly ;) > > > > What we actually want is BUG_ON(task_state == EXIT_DEAD) here. We do not > > want the EXIT_DEAD tasks in ->children/ptraced lists. These EXIT_DEAD tasks > > complicate the exit/wait/reparent paths. > > > > However, currently this is TODO. The main problem is the locking in > > wait_task_zombie(), we can set EXIT_DEAD and remove the task from list > > under read_lock(). > > Let me clarify in case I confused you. > > The EXIT_DEAD check in do_wait() paths doesn't mean "autoreap". It means > that this thread/process (depending on ptrace) was already reaped. It was > reaped by our sub-thread, or it was reaped because we ignore SIGCHLD, or > other reasons. This doesn't matter. > > In short, EXIT_DEAD means: we have to keep this thread on lists until the > task which set this state calls release_task(). That much I already understood from reading through the code, since exit_notify doesn't set task_state to EXIT_DEAD until the task is actually completely dead. When wait_consider_task sees p->task_state == EXIT_DEAD, that task isn't eligible for waiting at all. What I was proposing was that a task that isn't yet dead, but that is going to be autoreaped, is not eligible for waiting either. All the various wait* familiy of system calls should pretend it doesn't exist at all, because returning an autoreaped task from a wait* call introduces a race condition if the parent tries to *do* anything with the returned PID. If you launch a process with CLONE_FD, you need to manage it exclusively with that fd, not with the wait* family of system calls. That also implies that the child-stop and child-continued mechanisms (do_notify_parent_cldstop, WSTOPPED, WCONTINUED) should ignore the task too. In the future there could be a flag to clone4 that lets you get stop and continue notifications through the file descriptor. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html