On 28/02/11 17:10 -0500, Oren Laadan wrote: > > > On 02/28/2011 10:09 AM, Louis Rilling wrote: > > On 28/02/11 9:43 -0500, Oren Laadan wrote: > >> > >> > >> On 02/26/2011 08:54 AM, Louis Rilling wrote: > >>> On Fri, Feb 25, 2011 at 11:01:32AM -0800, Sukadev Bhattiprolu wrote: > >>>> Louis Rilling [Louis.Rilling@xxxxxxxxxxx] wrote: > >>>> | On 24/02/11 23:55 -0800, Sukadev Bhattiprolu wrote: > >>>> | > > >>>> | > diff --git a/kernel/checkpoint/restart.c b/kernel/checkpoint/restart.c > >>>> | > index b0ea8ec..8ecc052 100644 > >>>> | > --- a/kernel/checkpoint/restart.c > >>>> | > +++ b/kernel/checkpoint/restart.c > >>>> | > @@ -972,6 +972,7 @@ static int do_ghost_task(void) > >>>> | > if (ret < 0) > >>>> | > ckpt_err(ctx, ret, "ghost restart failed\n"); > >>>> | > > >>>> | > + current->exit_signal = -1; > >>>> | > >>>> | Setting ->exit_signal outside of tasklist_lock makes me nervous. All other > >>>> | places that change ->exit_signal hold write_lock_irq(&tasklist_lock), and > >>>> | eligibile_child() (for an instance of a reader being another task) holds > >>>> | read_lock(&tasklist_lock). But maybe this does not matter for ghost tasks. > >>>> | > >>>> > >>>> Yes, an earlier version had the write_lock(&tasklist_lock). Will add it > >>>> back. > >>>> > >>>> | > restore_debug_exit(ctx); > >>>> | > ckpt_ctx_put(ctx); > >>>> | > do_exit(0); > >>>> | > @@ -1465,7 +1466,22 @@ void exit_checkpoint(struct task_struct *tsk) > >>>> | > /* restarting zombies will activate next task in restart */ > >>>> | > if (tsk->flags & PF_RESTARTING) { > >>>> | > BUG_ON(ctx->active_pid == -1); > >>>> | > + > >>>> | > + /* > >>>> | > + * if we are a "ghost" task, that was terminated by the > >>>> | > + * container-init (from zap_pid_ns_processes()), we should > >>>> | > + * wake up the parent since we are now a detached process. > >>>> | > + */ > >>>> | > + read_lock_irq(&tasklist_lock); > >>>> | > >>>> | read_lock() is enough. tasklist_lock is never taken for write from IRQs or > >>>> | softIRQs. > >>>> | > >>>> | > + if (tsk->exit_state == EXIT_DEAD && !tsk->parent->exit_state) { > >>>> | > + ckpt_debug("[%d, %s]: exit_checkpoint(): notifying " > >>>> | > + "parent\n", tsk->pid, tsk->comm); > >>>> | > + __wake_up_parent(tsk, tsk->parent); > >>>> | > + } > >>>> | > + read_unlock_irq(&tasklist_lock); > >>>> | > >>>> | Looking at this closer, I wonder if this wakeup logic should be called from > >>>> | do_ghost_task(), right after setting ->exit_signal. This way there would be no > >>>> | need for a tricky condition to recognize ghost tasks, and (I think) this is > >>>> | closer to the other cases changing ->exit_signal (reparent_leader() and > >>>> | exit_notify()). > >>>> > >>>> Yes, we tried the following in the earlier version. > >>>> > >>>> void ghost_auto_reapable() > >>>> { > >>>> write_lock(&tasklist_lock); > >>>> current->exit_signal = -1; > >>>> __wake_up_parent(current, current->parent); > >>>> write_unlock(&tasklist_lock); > >>>> } > >>>> > >>>> And called this from do_ghost_task(). But with this, the parent could > >>>> wake up, find that it still has an eligible child (the ghost) to wait > >>>> for, and go back to waiting before the ghost enters the EXIT_DEAD state. > >>>> And so we would lose the wake up. > >>>> > >>>> (zap_pid_ns_processes() passes the __WALL so the ghost would be considered > >>>> an eligible child). > >>> > >>> I think I see now. The point is that ->exit_signal = -1 is only meant to work > >>> correctly for sub-threads, which the parent does not need to wait for. IOW, the > >>> notion of detached task is only implemented for sub-threads. > >>> > >>> IIUC, setting ->exit_signal to -1 is only used here to let exit_notify() set > >>> ->exit_state to EXIT_DEAD, right? Otherwise, setting ->exit_signal to 0 and > >>> letting do_notify_parent() proceed for ghost tasks would have be sufficient I > >>> guess (provided that the confusion between ghost tasks and zombies could be > >>> easily avoided in do_notify_parent()). > >>> > >>> Then I agree that the proposed patch looks like a reasonably simple approach. > >>> > >>> Thanks for the explanation, > >>> > >> > >> Louis, Suka: > >> > >> One subtlety with the method is that if a process get reparented > >> (for whatever reason) then the ->exit_signal field is reset to > >> SIGCHLD. Fortunately, that should not affect us because our ghost > >> tasks never become orphaned. > > > > AFAICS, this is not true for detached tasks: neither in reparent_leader(), nor > > in exit_notify(). Unless I missed another place? > > Doh.. you are totally right, I missed that. > Sometime it feels really good to be wrong :) ;) > > > > >> > >> However, I can't avoid thinking that maybe there is a better way > >> to do this altogether ? > >> > >> The requirement is simple: ghost tasks are temporary tasks whose > >> role is to keep certain pid's alive for the duration of the restart > >> and then exit without a trace before the restarted tasks resume > >> execution. > >> > >> The reason I opted for the ->exit_signal = -1 is because it makes > >> sure that the parent need not explicitly collect the child ghost. > >> > >> If I had set it to 0, then it would not send a signal, but still > >> would require a wait() to collect it (right ?). > > > > Right. > > > >> > >> Can you think of a way to achieve this functionality without the > >> subtleties that we have observed so far ? even at the cost of a > >> minor change to, say, wait() logic or what not ? > > > > I wonder if things would be easier by providing a mean to distinguish ghost > > tasks from truely restarted tasks in do_notify_parent(). > > > > Assuming this, there might be a kernel-patch-free way, although I can't say if > > this fits userspace restart constraints: > > > > Setting the signal handler of ghost tasks' parent to SIG_IGN, and ->exit_signal > > Yes, I thought about it. But those parent are part of the restart, > and changing their signal handlers will is tricky and will require > complex sync logic at the end of restart to ensure that all tasks > restore their handlers after the ghosts are gone and released, but > before any task gets to userspace ... I don't want to go there. Yes, I can imagine well the added complexity in userspace. > > > of ghost tasks to SIGCHLD, will make them autoreap. This requires that the > > parent does not need to synchronize with other children until all ghost tasks > > have exited, and that the parent remains alive too. > > > > So, why not adding some flag PF_RESTART_GHOST or TIF_RESTART_GHOST? > > Actually, we already have a way to distinguish them: > > if ((tsk->flags & PF_RESTARTING) && task_detached(tsk)) > > One problem with this is that we only set exit_signal to -1 after > the ghost was born, so if the parent is already waiting I think > that will be racy, as per suka's comment above. Yes. > > So looking at the code again, we could add one condition in exit.c > at wait_consider_task(), after the test of p->exit_state == EXIT_DEAD, > to also test: > > inline static bool is_ghost_task(p) > { > return (p->flags & (PF_EXITING|PF_RESTARTING) == > PF_EXITING|PF_RESTARTING) && task_detached(p) > } > > if (p->flags & is_ghost_task(p)) > return 0; > > Or something along the lines (e.g. used EXIT_ZOMBIE comparison instead > of PF_EXITING). While requiring a kernel patch, it is relatively short, > clean and easy to review. Adding the check in do_wait() is racy IMHO. What does prevent do_ghost_task() from setting ->exit_signal = -1 after the parent sleeps in do_wait()? Otherwise, why not making ghost tasks simply sub-threads of the "parent"? They would autoreap and nobody would wait or have to wait for them. I would feel so much better if we had not to change ->exit_signal in do_ghost_task()... Opinion? Thanks, Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes
Attachment:
signature.asc
Description: Digital signature
_______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers