----- Original Message ----- > Currently, runq sub-command doesn't consider CFS runqueue's current > task removed from CFS runqueue. Due to this, the remaining CFS > runqueus that follow the current task's is not displayed. This patch > fixes this by making runq sub-command search current task's runqueue > explicitly. > > Note that CFS runqueue exists for each task group, and so does CFS > runqueue's current task, and the above search needs to be done > recursively. > > Test > ==== > > On vmcore I made 7 task groups: > > root group --- A --- AA --- AAA > + +- AAB > | > +- AB --- ABA > +- ABB > > and then I ran three CPU bound tasks, which is exactly the same as > > int main(void) { for (;;) continue; return 0; } > > for each task group, including root group; so total 24 tasks. For > readability, I annotated each task name with its belonging group name. > For example, loop.ABA belongs to task group ABA. > > Look at CPU0 collumn below. [before] lacks 8 tasks and [after] > successfully shows all tasks on the runqueue, which is identical to > the result of [sched debug] that is expected to ouput correct result. > > I'll send this vmcore later. > > [before] > > crash> runq | cat > CPU 0 RUNQUEUE: ffff88000a215f80 > CURRENT: PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA" > RT PRIO_ARRAY: ffff88000a216098 > [no tasks queued] > CFS RB_ROOT: ffff88000a216010 > [120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA" > > <cut> > > [after] > > crash_fix> runq > CPU 0 RUNQUEUE: ffff88000a215f80 > CURRENT: PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA" > RT PRIO_ARRAY: ffff88000a216098 > [no tasks queued] > CFS RB_ROOT: ffff88000a216010 > [120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA" > [120] PID: 28271 TASK: ffff8800787a8b40 COMMAND: "loop.ABB" > [120] PID: 28272 TASK: ffff880037afd580 COMMAND: "loop.ABB" > [120] PID: 28245 TASK: ffff8800785e8b00 COMMAND: "loop.AB" > [120] PID: 28246 TASK: ffff880078628ac0 COMMAND: "loop.AB" > [120] PID: 28241 TASK: ffff880078616b40 COMMAND: "loop.AA" > [120] PID: 28239 TASK: ffff8800785774c0 COMMAND: "loop.AA" > [120] PID: 28240 TASK: ffff880078617580 COMMAND: "loop.AA" > [120] PID: 28232 TASK: ffff880079b5d4c0 COMMAND: "loop.A" > <cut> > > [sched debug] > > crash> runq -d > CPU 0 > [120] PID: 28232 TASK: ffff880079b5d4c0 COMMAND: "loop.A" > [120] PID: 28239 TASK: ffff8800785774c0 COMMAND: "loop.AA" > [120] PID: 28240 TASK: ffff880078617580 COMMAND: "loop.AA" > [120] PID: 28241 TASK: ffff880078616b40 COMMAND: "loop.AA" > [120] PID: 28245 TASK: ffff8800785e8b00 COMMAND: "loop.AB" > [120] PID: 28246 TASK: ffff880078628ac0 COMMAND: "loop.AB" > [120] PID: 28262 TASK: ffff880037cc40c0 COMMAND: "loop.ABA" > [120] PID: 28263 TASK: ffff880037aaa040 COMMAND: "loop.ABA" > [120] PID: 28271 TASK: ffff8800787a8b40 COMMAND: "loop.ABB" > [120] PID: 28272 TASK: ffff880037afd580 COMMAND: "loop.ABB" > <cut> > > Diff stat > ========= > > defs.h | 1 + > task.c | 37 +++++++++++++++++-------------------- > 2 files changed, 18 insertions(+), 20 deletions(-) > > Thanks. > HATAYAMA, Daisuke Hello Daisuke, Good catch! Plus your re-worked patch cleans things up nicely. And "runq -d" paid off quickly, didn't it? ;-) One minor problem, while testing your patch on a variety of kernels, several "runq" commands failed because the test kernels were not configured with CONFIG_FAIR_GROUP_SCHED: struct sched_entity { struct load_weight load; /* for load-balancing */ struct rb_node run_node; struct list_head group_node; unsigned int on_rq; u64 exec_start; u64 sum_exec_runtime; u64 vruntime; u64 prev_sum_exec_runtime; u64 nr_migrations; #ifdef CONFIG_SCHEDSTATS struct sched_statistics statistics; #endif #ifdef CONFIG_FAIR_GROUP_SCHED struct sched_entity *parent; /* rq on which this entity is (to be) queued: */ struct cfs_rq *cfs_rq; /* rq "owned" by this entity/group: */ struct cfs_rq *my_q; #endif }; so they failed like so: CPU 0 RUNQUEUE: ffffffff825f7520 CURRENT: PID: 3790 TASK: ffff88000c8f2cf0 COMMAND: "bash" RT PRIO_ARRAY: ffffffff825f75e8 [no tasks queued] CFS RB_ROOT: ffffffff825f75a0 runq: invalid structure member offset: sched_entity_my_q FILE: task.c LINE: 7035 FUNCTION: dump_tasks_in_cfs_rq() where line 7035 is where the first possible recursion is done: 7021 static int 7022 dump_tasks_in_cfs_rq(ulong cfs_rq) 7023 { 7024 struct task_context *tc; 7025 struct rb_root *root; 7026 struct rb_node *node; 7027 ulong my_q, leftmost, curr, curr_my_q; 7028 int total; 7029 7030 total = 0; 7031 7032 readmem(cfs_rq + OFFSET(cfs_rq_curr), KVADDR, &curr, sizeof(ulong), 7033 "curr", FAULT_ON_ERROR); 7034 if (curr) { 7035 readmem(curr + OFFSET(sched_entity_my_q), KVADDR, &curr_my_q, 7036 sizeof(ulong), "curr->my_q", FAULT_ON_ERROR); 7037 if (curr_my_q) 7038 total += dump_tasks_in_cfs_rq(curr_my_q); 7039 } 7040 7041 readmem(cfs_rq + OFFSET(cfs_rq_rb_leftmost), KVADDR, &leftmost, 7042 sizeof(ulong), "rb_leftmost", FAULT_ON_ERROR); 7043 root = (struct rb_root *)(cfs_rq + OFFSET(cfs_rq_tasks_timeline)); 7044 7045 for (node = rb_first(root); leftmost && node; node = rb_next(node)) { 7046 if (VALID_MEMBER(sched_entity_my_q)) { 7047 readmem((ulong)node - OFFSET(sched_entity_run_node) 7048 + OFFSET(sched_entity_my_q), KVADDR, &my_q, 7049 sizeof(ulong), "my_q", FAULT_ON_ERROR); 7050 if (my_q) { 7051 total += dump_tasks_in_cfs_rq(my_q); 7052 continue; 7053 } 7054 } I fixed it by imposing a VALID_MEMBER(sched_entity_my_q) check, similar to what is done at the second recursive call at line 7046 above: if (VALID_MEMBER(sched_entity_my_q)) { readmem(cfs_rq + OFFSET(cfs_rq_curr), KVADDR, &curr, sizeof(ulong), "curr", FAULT_ON_ERROR); if (curr) { readmem(curr + OFFSET(sched_entity_my_q), KVADDR, &curr_my_q, sizeof(ulong), "curr->my_q", FAULT_ON_ERROR); if (curr_my_q) total += dump_tasks_in_cfs_rq(curr_my_q); } } and that worked OK. I also added "sched_entity_my_q" to dump_offset_table() for "help -o". If you are OK with the changes above, the patch is queued for crash-6.0.3. Thanks, Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility