On Tue, Jul 20, 2010 at 7:12 PM, Oren Laadan <orenl@xxxxxxxxxxxxxxx> wrote: > > Hi John > > In your program, it is a thread of the root task (of the hierarchy) > that is missed. Indeed the previous patch was incomplete - it did > fix the non-root-threads case but spoiled the root-threads case. > That was silly... well, can you try this little patch: > > Thanks for following up, was very helpful ! > > Oren. Hi Oren, I'm still unable to fully restart the application with your patch, but the result is now different. If I attempt to restart using --pidns and -F, both threads are created and frozen. However, as soon as I thaw them I get a segfault. If I attempt to restart them without the --pidns option, I get a message from restart indicating that it's about to call sys_restart and restart hangs. I also have the following in my syslog: [ 1482.348060] [3753:3753:c/r:walk_task_subtree:633] total 2 ret 1 [ 1482.348060] [3753:3753:c/r:prepare_descendants:1148] nr 2/2 [ 1482.348060] [3753:3753:c/r:do_restore_coord:1320] restore prepare: 2 [ 1541.864073] [err -512][pos 419][E @ do_ghost_task:973]ghost restart failed [ 1541.864343] [err -512][pos 419][E @ do_restore_task:1084]task restart failed [ 1541.864346] [3755:3755:c/r:clear_task_ctx:852] task 3755 clear checkpoint_ctx [ 1541.864349] [3755:3755:c/r:do_restart:1444] restart err -4, exiting [ 1541.864352] [3755:3755:c/r:do_restart:1451] sys_restart returns -4 [ 1541.864366] [3757:3757:c/r:wait_checkpoint_ctx:938] wait_checkpoint_ctx: failed (-512) [ 1541.864368] [3757:3757:c/r:do_restart:1444] restart err -4, exiting [ 1541.864371] [3757:3757:c/r:do_restart:1451] sys_restart returns -4 [ 1541.864689] [3753:3753:c/r:wait_all_tasks_finish:1173] final sync kflags 0x1a (ret 0) [ 1541.864692] [3753:3753:c/r:do_restore_coord:1325] restore finish: 0 [ 1541.864694] [3753:3753:c/r:do_restore_coord:1331] restore deferqueue: 0 [ 1541.864698] [err -512][pos 419][E @ ckpt_read_obj_type:426]Expecting to read type 9001 [ 1541.864700] [3753:3753:c/r:do_restore_coord:1336] restore tail: -512 [ 1541.864703] [err -512][pos 419][E @ do_restore_coord:1350]restart failed (coordinator) [ 1541.864706] [3753:3753:c/r:walk_task_subtree:633] total 0 ret 0 [ 1541.864709] [3753:3753:c/r:clear_task_ctx:852] task 3753 clear checkpoint_ctx [ 1541.864715] [3753:3753:c/r:do_restart:1451] sys_restart returns -4 [ 1541.864718] [3753:3753:c/r:restore_debug_free:144] 3 tasks registered, nr_tasks was 0 nr_total 1 [ 1541.864721] [3753:3753:c/r:restore_debug_free:147] active pid was 0, ctx->errno -512 [ 1541.864723] [3753:3753:c/r:restore_debug_free:149] kflags 26 uflags 0 oflags 1 [ 1541.864726] [3753:3753:c/r:restore_debug_free:151] task[0] to run 3755 [ 1541.864728] [3753:3753:c/r:restore_debug_free:151] task[1] to run 3757 [ 1541.864731] [3753:3753:c/r:restore_debug_free:176] pid 3753 type Coord state Failed [ 1541.864735] [3753:3753:c/r:restore_debug_free:176] pid 3755 type Root state Failed [ 1541.864737] [3753:3753:c/r:restore_debug_free:176] pid 3756 type Ghost state Failed thanks, JP > > --- > diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c > index 171c867..3288af0 100644 > --- a/kernel/checkpoint/sys.c > +++ b/kernel/checkpoint/sys.c > @@ -605,13 +605,13 @@ int walk_task_subtree(struct task_struct *root, > continue; > } > > + /* if not last thread - proceed with thread */ > + task = next_thread(task); > + if (!thread_group_leader(task)) > + continue; > + > /* by definition, skip siblings of root */ > while (task != root) { > - /* if not last thread - proceed with thread */ > - task = next_thread(task); > - if (!thread_group_leader(task)) > - break; > - > /* if has sibling - proceed with sibling */ > if (!list_is_last(&task->sibling, &parent->children)) { > task = list_entry(task->sibling.next, > --- _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers