On Wed, Nov 30, 2022 at 12:37:15PM -0600, Eric W. Biederman wrote: > Frederic Weisbecker <frederic@xxxxxxxxxx> writes: > Two questions. > > 1) Is there any chance you need the exit_task_rcu_stop() and > exit_tasks_rcu_start() around schedule in the part of this code that > calls kernel_wait4. Indeed it could be relaxed there too if necessary. > > 2) I keep thinking zap_pid_ns_processes() should be changed so that > after it sends SIGKILL to all of the relevant processes to not wait, > and instead have wait_consider_task simply not allow the > init process of the pid namespace to be reaped. > > Am I right in thinking that such a change were to be made it would > make remove the deadlock without having to have any special code? > > It is just tricky enough to do that I don't want to discourage your > simpler change but this looks like a case that makes the pain of > changing zap_pid_ns_processes worthwhile in the practice. So you mean it still reaps those that were EXIT_ZOMBIE before ignoring SIGCHLD (the kernel_wait4() call) but it doesn't sleep anymore on those that autoreap (or get reaped by a parent outside that namespace) after ignoring SIGCHLD? Namely it doesn't do the schedule() loop I'm working around here and proceeds with exit_notify() and notifies its parent? And then in this case the responsibility of sleeping, until the init_process of the namespace is the last task in the namespace, goes to the parent while waiting that init_process, right? But what if the init_process of the given namespace autoreaps? Should it then wait itself until the namespace is empty? And then aren't we back to the initial issue? Thanks.