On Tue, 20 Jul 2010, John Paul Walters wrote: > On Tue, Jul 20, 2010 at 7:12 PM, Oren Laadan <orenl@xxxxxxxxxxxxxxx> wrote: > > > > Hi John > > > > In your program, it is a thread of the root task (of the hierarchy) > > that is missed. Indeed the previous patch was incomplete - it did > > fix the non-root-threads case but spoiled the root-threads case. > > That was silly... well, can you try this little patch: > > > > Thanks for following up, was very helpful ! > > > > Oren. > > Hi Oren, > > I'm still unable to fully restart the application with your patch, but > the result is now different. If I attempt to restart using --pidns > and -F, both threads are created and frozen. However, as soon as I > thaw them I get a segfault. If I attempt to restart them without the > --pidns option, I get a message from restart indicating that it's > about to call sys_restart and restart hangs. I also have the > following in my syslog: Hi John, I assume the log below is for the --no-pidns case, right ? Can you also post the output of 'restart -vd ...' ? (Unfortunately I won't have a chance to try it until the weekend) Thanks, Oren. > > > [ 1482.348060] [3753:3753:c/r:walk_task_subtree:633] total 2 ret 1 > [ 1482.348060] [3753:3753:c/r:prepare_descendants:1148] nr 2/2 > [ 1482.348060] [3753:3753:c/r:do_restore_coord:1320] restore prepare: 2 > [ 1541.864073] [err -512][pos 419][E @ do_ghost_task:973]ghost restart failed > [ 1541.864343] [err -512][pos 419][E @ do_restore_task:1084]task restart failed > [ 1541.864346] [3755:3755:c/r:clear_task_ctx:852] task 3755 clear checkpoint_ctx > [ 1541.864349] [3755:3755:c/r:do_restart:1444] restart err -4, exiting > [ 1541.864352] [3755:3755:c/r:do_restart:1451] sys_restart returns -4 > [ 1541.864366] [3757:3757:c/r:wait_checkpoint_ctx:938] > wait_checkpoint_ctx: failed (-512) > [ 1541.864368] [3757:3757:c/r:do_restart:1444] restart err -4, exiting > [ 1541.864371] [3757:3757:c/r:do_restart:1451] sys_restart returns -4 > [ 1541.864689] [3753:3753:c/r:wait_all_tasks_finish:1173] final sync > kflags 0x1a (ret 0) > [ 1541.864692] [3753:3753:c/r:do_restore_coord:1325] restore finish: 0 > [ 1541.864694] [3753:3753:c/r:do_restore_coord:1331] restore deferqueue: 0 > [ 1541.864698] [err -512][pos 419][E @ > ckpt_read_obj_type:426]Expecting to read type 9001 > [ 1541.864700] [3753:3753:c/r:do_restore_coord:1336] restore tail: -512 > [ 1541.864703] [err -512][pos 419][E @ do_restore_coord:1350]restart > failed (coordinator) > [ 1541.864706] [3753:3753:c/r:walk_task_subtree:633] total 0 ret 0 > [ 1541.864709] [3753:3753:c/r:clear_task_ctx:852] task 3753 clear checkpoint_ctx > [ 1541.864715] [3753:3753:c/r:do_restart:1451] sys_restart returns -4 > [ 1541.864718] [3753:3753:c/r:restore_debug_free:144] 3 tasks > registered, nr_tasks was 0 nr_total 1 > [ 1541.864721] [3753:3753:c/r:restore_debug_free:147] active pid was > 0, ctx->errno -512 > [ 1541.864723] [3753:3753:c/r:restore_debug_free:149] kflags 26 uflags > 0 oflags 1 > [ 1541.864726] [3753:3753:c/r:restore_debug_free:151] task[0] to run 3755 > [ 1541.864728] [3753:3753:c/r:restore_debug_free:151] task[1] to run 3757 > [ 1541.864731] [3753:3753:c/r:restore_debug_free:176] pid 3753 type > Coord state Failed > [ 1541.864735] [3753:3753:c/r:restore_debug_free:176] pid 3755 type > Root state Failed > [ 1541.864737] [3753:3753:c/r:restore_debug_free:176] pid 3756 type > Ghost state Failed > > thanks, > JP > > > > > --- > > diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c > > index 171c867..3288af0 100644 > > --- a/kernel/checkpoint/sys.c > > +++ b/kernel/checkpoint/sys.c > > @@ -605,13 +605,13 @@ int walk_task_subtree(struct task_struct *root, > > continue; > > } > > > > + /* if not last thread - proceed with thread */ > > + task = next_thread(task); > > + if (!thread_group_leader(task)) > > + continue; > > + > > /* by definition, skip siblings of root */ > > while (task != root) { > > - /* if not last thread - proceed with thread */ > > - task = next_thread(task); > > - if (!thread_group_leader(task)) > > - break; > > - > > /* if has sibling - proceed with sibling */ > > if (!list_is_last(&task->sibling, &parent->children)) { > > task = list_entry(task->sibling.next, > > --- > >
_______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers