Hi John, Please try the following patch - it should be applied _instead_ of the patch I sent on 7/20. The previous patch was still insufficient when the root task has not only threads, but also a child (the child was a "ghost" task used temporarily during restart). I believe this patch correctly addresses the problem, and I tested against your program with and without --pidns. I'll wait for your confirmation before pushing the fix to cpt-v22-dev. Thanks ! Oren. --- diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c index 171c867..c5517c2 100644 --- a/kernel/checkpoint/sys.c +++ b/kernel/checkpoint/sys.c @@ -625,8 +625,11 @@ int walk_task_subtree(struct task_struct *root, } /* if we arrive at root again -- done */ - if (task == root) - break; + if (task == root) { + /* if not last thread - proceed with thread */ + task = root = next_thread(task); + if (thread_group_leader(task)) + break; } read_unlock(&tasklist_lock); --- On Thu, 22 Jul 2010, John Paul Walters wrote: > Hi Oren, > > Thanks for the patch. For the --pidns case, that seems to have solved > the problem. In the case of --no-pidns, restart still hangs as > described before. Should this work with in the --no-pidns case, or is > it expected to fail in this case? > > JP > > On Wed, Jul 21, 2010 at 9:04 PM, Oren Laadan <orenl@xxxxxxxxxxxxxxx> wrote: > > Hi John, > > > > This is a bit embarrassing, the behavior sounds too familiar -- > > please try to following patch: > > > > -- > > diff --git a/arch/x86/kernel/checkpoint.c b/arch/x86/kernel/checkpoint.c > > index 3fb9deb..b770f70 100644 > > --- a/arch/x86/kernel/checkpoint.c > > +++ b/arch/x86/kernel/checkpoint.c > > @@ -104,7 +104,7 @@ int checkpoint_thread(struct ckpt_ctx *ctx, struct task_struct *t) > > h->gdt_entry_tls_entries = GDT_ENTRY_TLS_ENTRIES; > > h->sizeof_tls_array = tls_size; > > h->sysenter_return = (__u64) (unsigned long) > > - task_thread_info(current)->sysenter_return; > > + task_thread_info(t)->sysenter_return; > > > > /* For simplicity dump the entire array */ > > memcpy(h + 1, t->thread.tls_array, tls_size); > > -- > > > > On Wed, 21 Jul 2010, John Paul Walters wrote: > > > >> >> > >> >> Hi Oren, > >> >> > >> >> I'm still unable to fully restart the application with your patch, but > >> >> the result is now different. If I attempt to restart using --pidns > >> >> and -F, both threads are created and frozen. However, as soon as I > >> >> thaw them I get a segfault. If I attempt to restart them without the > >> >> --pidns option, I get a message from restart indicating that it's > >> >> about to call sys_restart and restart hangs. I also have the > >> >> following in my syslog: > >> > > >> > Hi John, > >> > > >> > I assume the log below is for the --no-pidns case, right ? > >> > Can you also post the output of 'restart -vd ...' ? > >> > (Unfortunately I won't have a chance to try it until the weekend) > >> > > >> > >> Hi Oren, > >> > >> That's correct, the original log was for the --no-pidns case. Below > >> I've included the restart log up to the point where it hangs at > >> sys_restart. Thanks again for all of your help. > >> > >> best, > >> JP > >> > >> ./restart -v -d --no-pidns < checkpoint_out > >> <4124>number of tasks: 2 > >> <4124>number of vpids: 0 > >> <4124>total tasks (including ghosts): 3 > >> <4124>pid 3583: thread tgid 3582 > >> <4124>pid 3583: creator set to 3582 > >> <4124>pid 1: propagate session 3582 > >> <4124>pid 1: creator set to 3582 > >> <4124>pid 1: set session > >> <4124>pid 1: moving up to 3582 > >> <4124>====== TASKS > >> <4124> [0] pid 3582 ppid 3349 sid 0 creator 0 > >> <4124> [1] pid 3583 ppid 3349 sid 0 creator 3582 prev 1 T > >> <4124> [2] pid 1 ppid 3582 sid 3582 creator 3582 next 3583 S G > >> <4124>............ > >> <4124>task[0].vidx = -1 > >> <4124>task[1].vidx = -1 > >> <4124>subtree (existing pidns) > >> <4124>forking child vpid 3582 flags 0x1 > >> <4124>task 3582 forking with flags 11 numpids 1 > >> <4124>task 3582 pid[0]=0 > >> <4124>forked child vpid 4126 (asked 3582) > >> <4126>root task pid 4126 > >> <4126>pid 3582: pid 4126 sid 3386 parent 4124 > >> <4126>pid 3582: fork child 1 with session > >> <4126>forking child vpid 1 flags 0x12 > >> <4126>task 1 forking with flags 11 numpids 1 > >> <4126>task 1 pid[0]=0 > >> <4126>forked child vpid 4127 (asked 1) > >> <4126>pid 3582: fork child 3583 without session > >> <4126>forking child vpid 3583 flags 0x4 > >> <4126>task 3583 forking with flags 10911 numpids 1 > >> <4126>task 3583 pid[0]=0 > >> <4126>forked child vpid 4128 (asked 3583) > >> <4126>about to call sys_restart(), flags 0 > >> <4125>====== PIDS ARRAY > >> <4125>[0] pid 3582 ppid 1 sid 1 pgid 3582 > >> <4125>[1] pid 3583 ppid 1 sid 1 pgid 3582 > >> <4125>............ > >> <4125>c/r swap old 3582 new 4126 > >> <4128>pid 3583: pid 4128 sid 3386 parent 4124 > >> <4128>about to call sys_restart(), flags 0 > >> <4125>c/r swap old 3583 new 4128 > >> <4127>pid 1: pid 4127 sid 3386 parent 4126 > >> <4125>c/r swap old 1 new 4127 > >> <4125>====== PIDS ARRAY (swaped) > >> <4125>[0] pid 4126 ppid 1 sid 4127 pgid 4126 > >> <4125>[1] pid 4128 ppid 1 sid 4127 pgid 4126 > >> <4125>............ > >> <4125>c/r read input 16384 > >> <4127>about to call sys_restart(), flags 0x4 > >> <4125>c/r read input 16384 > >> <4125>c/r read input 16384 > >> <4125>c/r read input 16384 > >> <4125>c/r read input 16384 > >> > >> > >> > >> > >> > >> > >> > Thanks, > >> > > >> > Oren. > >> > > >> >> > >> >> > >> >> [ 1482.348060] [3753:3753:c/r:walk_task_subtree:633] total 2 ret 1 > >> >> [ 1482.348060] [3753:3753:c/r:prepare_descendants:1148] nr 2/2 > >> >> [ 1482.348060] [3753:3753:c/r:do_restore_coord:1320] restore prepare: 2 > >> >> [ 1541.864073] [err -512][pos 419][E @ do_ghost_task:973]ghost restart failed > >> >> [ 1541.864343] [err -512][pos 419][E @ do_restore_task:1084]task restart failed > >> >> [ 1541.864346] [3755:3755:c/r:clear_task_ctx:852] task 3755 clear checkpoint_ctx > >> >> [ 1541.864349] [3755:3755:c/r:do_restart:1444] restart err -4, exiting > >> >> [ 1541.864352] [3755:3755:c/r:do_restart:1451] sys_restart returns -4 > >> >> [ 1541.864366] [3757:3757:c/r:wait_checkpoint_ctx:938] > >> >> wait_checkpoint_ctx: failed (-512) > >> >> [ 1541.864368] [3757:3757:c/r:do_restart:1444] restart err -4, exiting > >> >> [ 1541.864371] [3757:3757:c/r:do_restart:1451] sys_restart returns -4 > >> >> [ 1541.864689] [3753:3753:c/r:wait_all_tasks_finish:1173] final sync > >> >> kflags 0x1a (ret 0) > >> >> [ 1541.864692] [3753:3753:c/r:do_restore_coord:1325] restore finish: 0 > >> >> [ 1541.864694] [3753:3753:c/r:do_restore_coord:1331] restore deferqueue: 0 > >> >> [ 1541.864698] [err -512][pos 419][E @ > >> >> ckpt_read_obj_type:426]Expecting to read type 9001 > >> >> [ 1541.864700] [3753:3753:c/r:do_restore_coord:1336] restore tail: -512 > >> >> [ 1541.864703] [err -512][pos 419][E @ do_restore_coord:1350]restart > >> >> failed (coordinator) > >> >> [ 1541.864706] [3753:3753:c/r:walk_task_subtree:633] total 0 ret 0 > >> >> [ 1541.864709] [3753:3753:c/r:clear_task_ctx:852] task 3753 clear checkpoint_ctx > >> >> [ 1541.864715] [3753:3753:c/r:do_restart:1451] sys_restart returns -4 > >> >> [ 1541.864718] [3753:3753:c/r:restore_debug_free:144] 3 tasks > >> >> registered, nr_tasks was 0 nr_total 1 > >> >> [ 1541.864721] [3753:3753:c/r:restore_debug_free:147] active pid was > >> >> 0, ctx->errno -512 > >> >> [ 1541.864723] [3753:3753:c/r:restore_debug_free:149] kflags 26 uflags > >> >> 0 oflags 1 > >> >> [ 1541.864726] [3753:3753:c/r:restore_debug_free:151] task[0] to run 3755 > >> >> [ 1541.864728] [3753:3753:c/r:restore_debug_free:151] task[1] to run 3757 > >> >> [ 1541.864731] [3753:3753:c/r:restore_debug_free:176] pid 3753 type > >> >> Coord state Failed > >> >> [ 1541.864735] [3753:3753:c/r:restore_debug_free:176] pid 3755 type > >> >> Root state Failed > >> >> [ 1541.864737] [3753:3753:c/r:restore_debug_free:176] pid 3756 type > >> >> Ghost state Failed > >> >> > >> >> thanks, > >> >> JP > >> >> > >> >> > > >> >> > --- > >> >> > diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c > >> >> > index 171c867..3288af0 100644 > >> >> > --- a/kernel/checkpoint/sys.c > >> >> > +++ b/kernel/checkpoint/sys.c > >> >> > @@ -605,13 +605,13 @@ int walk_task_subtree(struct task_struct *root, > >> >> > continue; > >> >> > } > >> >> > > >> >> > + /* if not last thread - proceed with thread */ > >> >> > + task = next_thread(task); > >> >> > + if (!thread_group_leader(task)) > >> >> > + continue; > >> >> > + > >> >> > /* by definition, skip siblings of root */ > >> >> > while (task != root) { > >> >> > - /* if not last thread - proceed with thread */ > >> >> > - task = next_thread(task); > >> >> > - if (!thread_group_leader(task)) > >> >> > - break; > >> >> > - > >> >> > /* if has sibling - proceed with sibling */ > >> >> > if (!list_is_last(&task->sibling, &parent->children)) { > >> >> > task = list_entry(task->sibling.next, > >> >> > --- > >> >> > >> >> > >> > >> > >
_______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers