From: Dave Anderson <anderson@xxxxxxxxxx> Subject: Re: [PATCH] runq: search current task's runqueue explicitly Date: Thu, 05 Jan 2012 15:32:12 -0500 (EST) > > > ----- Original Message ----- >> Currently, runq sub-command doesn't consider CFS runqueue's current >> task removed from CFS runqueue. Due to this, the remaining CFS >> runqueus that follow the current task's is not displayed. This patch >> fixes this by making runq sub-command search current task's runqueue >> explicitly. >> >> Note that CFS runqueue exists for each task group, and so does CFS >> runqueue's current task, and the above search needs to be done >> recursively. >> >> Test >> ==== >> >> On vmcore I made 7 task groups: >> >> root group --- A --- AA --- AAA >> + +- AAB >> | >> +- AB --- ABA >> +- ABB >> >> and then I ran three CPU bound tasks, which is exactly the same as >> >> int main(void) { for (;;) continue; return 0; } >> >> for each task group, including root group; so total 24 tasks. For >> readability, I annotated each task name with its belonging group name. >> For example, loop.ABA belongs to task group ABA. >> >> Look at CPU0 collumn below. [before] lacks 8 tasks and [after] >> successfully shows all tasks on the runqueue, which is identical to >> the result of [sched debug] that is expected to ouput correct result. >> >> I'll send this vmcore later. >> >> [before] >> >> crash> runq | cat >> CPU 0 RUNQUEUE: ffff88000a215f80 >> CURRENT: PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA" >> RT PRIO_ARRAY: ffff88000a216098 >> [no tasks queued] >> CFS RB_ROOT: ffff88000a216010 >> [120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA" >> >> <cut> >> >> [after] >> >> crash_fix> runq >> CPU 0 RUNQUEUE: ffff88000a215f80 >> CURRENT: PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA" >> RT PRIO_ARRAY: ffff88000a216098 >> [no tasks queued] >> CFS RB_ROOT: ffff88000a216010 >> [120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA" >> [120] PID: 28271 TASK: ffff8800787a8b40 COMMAND: "loop.ABB" >> [120] PID: 28272 TASK: ffff880037afd580 COMMAND: "loop.ABB" >> [120] PID: 28245 TASK: ffff8800785e8b00 COMMAND: "loop.AB" >> [120] PID: 28246 TASK: ffff880078628ac0 COMMAND: "loop.AB" >> [120] PID: 28241 TASK: ffff880078616b40 COMMAND: "loop.AA" >> [120] PID: 28239 TASK: ffff8800785774c0 COMMAND: "loop.AA" >> [120] PID: 28240 TASK: ffff880078617580 COMMAND: "loop.AA" >> [120] PID: 28232 TASK: ffff880079b5d4c0 COMMAND: "loop.A" >> <cut> >> >> [sched debug] >> >> crash> runq -d >> CPU 0 >> [120] PID: 28232 TASK: ffff880079b5d4c0 COMMAND: "loop.A" >> [120] PID: 28239 TASK: ffff8800785774c0 COMMAND: "loop.AA" >> [120] PID: 28240 TASK: ffff880078617580 COMMAND: "loop.AA" >> [120] PID: 28241 TASK: ffff880078616b40 COMMAND: "loop.AA" >> [120] PID: 28245 TASK: ffff8800785e8b00 COMMAND: "loop.AB" >> [120] PID: 28246 TASK: ffff880078628ac0 COMMAND: "loop.AB" >> [120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA" >> [120] PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA" >> [120] PID: 28271 TASK: ffff8800787a8b40 COMMAND: "loop.ABB" >> [120] PID: 28272 TASK: ffff880037afd580 COMMAND: "loop.ABB" >> <cut> >> >> Diff stat >> ========= >> >> defs.h | 1 + >> task.c | 37 +++++++++++++++++-------------------- >> 2 files changed, 18 insertions(+), 20 deletions(-) >> >> Thanks. >> HATAYAMA, Daisuke > > Hello Daisuke, > > Good catch! Plus your re-worked patch cleans things up nicely. > > And "runq -d" paid off quickly, didn't it? ;-) > > One minor problem, while testing your patch on a variety of kernels, > several "runq" commands failed because the test kernels were > not configured with CONFIG_FAIR_GROUP_SCHED: > > struct sched_entity { > struct load_weight load; /* for load-balancing */ > struct rb_node run_node; > struct list_head group_node; > unsigned int on_rq; > > u64 exec_start; > u64 sum_exec_runtime; > u64 vruntime; > u64 prev_sum_exec_runtime; > > u64 nr_migrations; > > #ifdef CONFIG_SCHEDSTATS > struct sched_statistics statistics; > #endif > > #ifdef CONFIG_FAIR_GROUP_SCHED > struct sched_entity *parent; > /* rq on which this entity is (to be) queued: */ > struct cfs_rq *cfs_rq; > /* rq "owned" by this entity/group: */ > struct cfs_rq *my_q; > #endif > }; > > so they failed like so: > > CPU 0 RUNQUEUE: ffffffff825f7520 > CURRENT: PID: 3790 TASK: ffff88000c8f2cf0 COMMAND: "bash" > RT PRIO_ARRAY: ffffffff825f75e8 > [no tasks queued] > CFS RB_ROOT: ffffffff825f75a0 > runq: invalid structure member offset: sched_entity_my_q > FILE: task.c LINE: 7035 FUNCTION: dump_tasks_in_cfs_rq() > > where line 7035 is where the first possible recursion is done: > > 7021 static int > 7022 dump_tasks_in_cfs_rq(ulong cfs_rq) > 7023 { > 7024 struct task_context *tc; > 7025 struct rb_root *root; > 7026 struct rb_node *node; > 7027 ulong my_q, leftmost, curr, curr_my_q; > 7028 int total; > 7029 > 7030 total = 0; > 7031 > 7032 readmem(cfs_rq + OFFSET(cfs_rq_curr), KVADDR, &curr, sizeof(ulong), > 7033 "curr", FAULT_ON_ERROR); > 7034 if (curr) { > 7035 readmem(curr + OFFSET(sched_entity_my_q), KVADDR, &curr_my_q, > 7036 sizeof(ulong), "curr->my_q", FAULT_ON_ERROR); > 7037 if (curr_my_q) > 7038 total += dump_tasks_in_cfs_rq(curr_my_q); > 7039 } > 7040 > 7041 readmem(cfs_rq + OFFSET(cfs_rq_rb_leftmost), KVADDR, &leftmost, > 7042 sizeof(ulong), "rb_leftmost", FAULT_ON_ERROR); > 7043 root = (struct rb_root *)(cfs_rq + OFFSET(cfs_rq_tasks_timeline)); > 7044 > 7045 for (node = rb_first(root); leftmost && node; node = rb_next(node)) { > 7046 if (VALID_MEMBER(sched_entity_my_q)) { > 7047 readmem((ulong)node - OFFSET(sched_entity_run_node) > 7048 + OFFSET(sched_entity_my_q), KVADDR, &my_q, > 7049 sizeof(ulong), "my_q", FAULT_ON_ERROR); > 7050 if (my_q) { > 7051 total += dump_tasks_in_cfs_rq(my_q); > 7052 continue; > 7053 } > 7054 } > > I fixed it by imposing a VALID_MEMBER(sched_entity_my_q) check, similar > to what is done at the second recursive call at line 7046 above: > > if (VALID_MEMBER(sched_entity_my_q)) { > readmem(cfs_rq + OFFSET(cfs_rq_curr), KVADDR, &curr, > sizeof(ulong), "curr", FAULT_ON_ERROR); > if (curr) { > readmem(curr + OFFSET(sched_entity_my_q), KVADDR, > &curr_my_q, sizeof(ulong), "curr->my_q", > FAULT_ON_ERROR); > if (curr_my_q) > total += dump_tasks_in_cfs_rq(curr_my_q); > } > } > > and that worked OK. > > I also added "sched_entity_my_q" to dump_offset_table() for "help -o". > > If you are OK with the changes above, the patch is queued for crash-6.0.3. > I missed the case where fair scheduler is disabled. I'm of course OK. Thanks for the fix. Thanks. HATAYAMA, Daisuke -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility