On 24/02/11 23:55 -0800, Sukadev Bhattiprolu wrote: > > From b32a8c440687955975180e909620eba50cb146c3 Mon Sep 17 00:00:00 2001 > From: Sukadev Bhattiprolu <sukadev@xxxxxxxxxxxxxxxxxx> > Date: Tue, 22 Feb 2011 10:44:55 -0800 > Subject: [PATCH 1/1] Fix ghost task bug > > Ghost tasks used to be marked as "detached" in do_ghost_task() but > this resulted in a crash which was fixed as discussed in: > https://lists.linux-foundation.org/pipermail/containers/2010-December/026076.html. > > But that fix incorrectly attempted to mark a task as detached from user > space. It is not possible to "detach" a task from user space, which > means we must detach the thread in the kernel while still avoiding > the above crash. > > The original crash occured because the container-init did not wait for > the detached children. When the detached children exited, they would > access a freed proc_mnt. Eric Biederman fixed this crash as discussed in > the above link. > > While Eric's patch fixed the crash it could still leave the container > init hanging indefinitely due to the following race: > > container-int: ghost task > ----------------- ------------ > do_ghost_task() > > zap_pid_ns_processes() > - send SIGKILL > - do_wait() > - at least one child exists > - so wait for child to exit > wake up for the SIGKILL > set ->exit_signal = -1 > exit without notifying parent > > This leaves the container init waiting indefinitely. > > To fix this hang, we have the children of container-init issue an extra wake > up call in exit_checkpoint(). Note that in exit_checkpoint() we do not know > if it is the ghost task that is exiting. Since this wake up applies to any > other task, we should further make sure that the parent is itself not exiting > (which could cause __wake_up_parent() to access invalid pointers in the > parent's task structure). > > See this thread for more discussion: > > https://lists.linux-foundation.org/pipermail/containers/2011-February/026459.html > > Signed-off-by: Sukadev Bhattiprolu (sukadev@xxxxxxxxxx) > Cc: Louis Rilling <Louis.Rilling@xxxxxxxxxxx> > --- > kernel/checkpoint/restart.c | 16 ++++++++++++++++ > 1 files changed, 16 insertions(+), 0 deletions(-) > > diff --git a/kernel/checkpoint/restart.c b/kernel/checkpoint/restart.c > index b0ea8ec..8ecc052 100644 > --- a/kernel/checkpoint/restart.c > +++ b/kernel/checkpoint/restart.c > @@ -972,6 +972,7 @@ static int do_ghost_task(void) > if (ret < 0) > ckpt_err(ctx, ret, "ghost restart failed\n"); > > + current->exit_signal = -1; Setting ->exit_signal outside of tasklist_lock makes me nervous. All other places that change ->exit_signal hold write_lock_irq(&tasklist_lock), and eligibile_child() (for an instance of a reader being another task) holds read_lock(&tasklist_lock). But maybe this does not matter for ghost tasks. > restore_debug_exit(ctx); > ckpt_ctx_put(ctx); > do_exit(0); > @@ -1465,7 +1466,22 @@ void exit_checkpoint(struct task_struct *tsk) > /* restarting zombies will activate next task in restart */ > if (tsk->flags & PF_RESTARTING) { > BUG_ON(ctx->active_pid == -1); > + > + /* > + * if we are a "ghost" task, that was terminated by the > + * container-init (from zap_pid_ns_processes()), we should > + * wake up the parent since we are now a detached process. > + */ > + read_lock_irq(&tasklist_lock); read_lock() is enough. tasklist_lock is never taken for write from IRQs or softIRQs. > + if (tsk->exit_state == EXIT_DEAD && !tsk->parent->exit_state) { > + ckpt_debug("[%d, %s]: exit_checkpoint(): notifying " > + "parent\n", tsk->pid, tsk->comm); > + __wake_up_parent(tsk, tsk->parent); > + } > + read_unlock_irq(&tasklist_lock); Looking at this closer, I wonder if this wakeup logic should be called from do_ghost_task(), right after setting ->exit_signal. This way there would be no need for a tricky condition to recognize ghost tasks, and (I think) this is closer to the other cases changing ->exit_signal (reparent_leader() and exit_notify()). Opinions? Thanks, Louis > + > restore_task_done(ctx); > + > } > > ckpt_ctx_put(ctx); > -- > 1.6.6.1 > > _______________________________________________ > Containers mailing list > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx > https://lists.linux-foundation.org/mailman/listinfo/containers -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes
Attachment:
signature.asc
Description: Digital signature
_______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers