On 12/20, Eric W. Biederman wrote: > > Oleg Nesterov <oleg@xxxxxxxxxx> writes: > > > On 11/16, Eric W. Biederman wrote: > >> > >> @@ -216,22 +216,15 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns) > >> > >> /* > >> * sys_wait4() above can't reap the TASK_DEAD children. > >> - * Make sure they all go away, see __unhash_process(). > >> + * Make sure they all go away, see free_pid(). > >> */ > >> for (;;) { > >> - bool need_wait = false; > >> - > >> - read_lock(&tasklist_lock); > >> - if (!list_empty(¤t->children)) { > >> - __set_current_state(TASK_UNINTERRUPTIBLE); > >> - need_wait = true; > >> - } > >> - read_unlock(&tasklist_lock); > >> - > >> - if (!need_wait) > >> + set_current_state(TASK_UNINTERRUPTIBLE); > >> + if (pid_ns->nr_hashed == 1) > >> break; > >> schedule(); > >> } > > > > I agree, the patch itself looks fine. > > > > But, with all other changes I do not understand this part at all. > > > > A task from the parent namespace can do setns + fork at any time > > (until nr_hashed >= 0). So ->nr_hashed can be incremented again > > after zap_pid_ns_processes() returns. XXX: this creates the new pid P in this ns. Please see below... > I want to talk about how alloc_pid and free_pid prevent nr_hashed > from increasing once the last processes has exited the pid namespace > but that doesn't apply here. Not sure I understand, but it seems you agree this can happen. > > Or, we can sleep in TASK_UNINTERRUPTIBLE "forever" if this happens > > after kill-them-all. > > Sleeping forever should be prevented by this chunk in free_pid: Note that I said "forever", not forever ;) > > switch(--ns->nr_hashed) { > case 1: > /* When all that is left in the pid namespace > * is the reaper wake up the reaper. The reaper > * may be sleeping in zap_pid_ns_processes(). > */ > wake_up_process(ns->child_reaper); > > > I admit it continues to be true that if an injected process or a > debugged process does not exit we can block waiting for all of the > processes to be reaped indefinitely. Yes, I meant until the injected process exits. > > Could you explain why do we need to wait at all? I can be easily > > wrong, but at first glance the original reason for this wait has > > gone away? > > It is very nice to know that when you do waitpid for the init process of > a pid namespace that there are no other processes in the pid namespace. OK, and I agree. But my point was, at least this _looks_ strange, because ns->nr_hashed == 1 is not stable. And in fact I think this is not strange, but simply wrong. Please consider the XXX case above. Suppose that free_pid(P) happens after ns->child_reaper exits and thus this pointer points to nowhere. Suppose also that there is another injected pid so nr_hashed == 2. In this case wake_up_process(ns->child_reaper) means use-after-free, no? Oleg. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers