----- Original Message ----- > Hello Dave, > > In runq command, when dumping cfs and rt runqueues, > it seems that we get the wrong nr_running values of rq > and cfs_rq. > > Please refer to the attached patch. > > Thanks > Zhang Yanfei Hello Zhang, I understand what you are trying to accomplish with this patch, but none of my test dumpfiles can actually verify it because there is no difference with or without your patch. What failure mode did you see in your testing? I presume that it just showed "[no tasks queued]" for the RT runqueue when there were actually tasks queued there? The reason I ask is that I'm thinking that a better solution would be to simplify dump_CFS_runqueues() by *not* accessing and using rq_nr_running, cfs_rq_nr_running or cfs_rq_h_nr_running. Those counters are only read to determine the "active" argument to pass to dump_RT_prio_array(), which returns immediately if it is FALSE. However, if we get rid of the "active" argument and simply allow dump_RT_prio_array() to always check its queues every time, it still works just fine. For example, I tested my set of sample dumpfiles with this patch: diff -u -r1.205 task.c --- task.c 12 Jul 2012 20:04:00 -0000 1.205 +++ task.c 22 Aug 2012 15:33:32 -0000 @@ -7636,7 +7636,7 @@ OFFSET(cfs_rq_tasks_timeline)); } - dump_RT_prio_array(nr_running != cfs_rq_nr_running, + dump_RT_prio_array(TRUE, runq + OFFSET(rq_rt) + OFFSET(rt_rq_active), &runqbuf[OFFSET(rq_rt) + OFFSET(rt_rq_active)]); and the output is identical to testing with, and without, your patch. So the question is whether dump_CFS_runqueues() should be needlessly complicated with all of the "nr_running" references? In fact, it also seems possible that a crash could happen at a point in the scheduler code where those counters are not valid/current/trustworthy. So unless you can convince me otherwise, I'd prefer to just remove the "nr_running" business completely. That being said -- and for your future reference -- when creating patches such as yours, please consider the following: When adding entries to the offset_table, always put them at the end of the structure so that the offsets to the currently-existing members do not change. This allows older extension modules to still have valid OFFSET() values without having to be recompiled: --- a/defs.h +++ b/defs.h @@ -1576,6 +1576,7 @@ struct offset_table { /* stash of commonly-used offsets */ long rq_nr_running; long cfs_rq_rb_leftmost; long cfs_rq_nr_running; + long cfs_rq_h_nr_running; long cfs_rq_tasks_timeline; long task_struct_se; long sched_entity_run_node; But as you have done, you can still display the new entry from "help -o" nearby its related cfs_rq_xxx offsets. Then, during initialization, there's no need to do the preliminary MEMBER_EXISTS() call in this case -- just call MEMBER_OFFSET_INIT() regardless. If it fails, the offset will remain -1 (INVALID): --- a/task.c +++ b/task.c @@ -7566,6 +7566,9 @@ dump_CFS_runqueues(void) MEMBER_OFFSET_INIT(sched_entity_on_rq, "sched_entity", "on_rq"); MEMBER_OFFSET_INIT(cfs_rq_rb_leftmost, "cfs_rq", "rb_leftmost"); MEMBER_OFFSET_INIT(cfs_rq_nr_running, "cfs_rq", "nr_running"); + if (MEMBER_EXISTS("cfs_rq", "h_nr_running")) + MEMBER_OFFSET_INIT(cfs_rq_h_nr_running, + "cfs_rq", "h_nr_running"); MEMBER_OFFSET_INIT(cfs_rq_tasks_timeline, "cfs_rq", "tasks_timeline"); MEMBER_OFFSET_INIT(cfs_rq_curr, "cfs_rq", "curr"); And then after initialization, instead of using MEMBER_EXISTS(), you can use "if (VALID_MEMBER(cfs_rq_h_nr_running))", which simply accesses the offset_table -- instead of involving a gdb call every time: + if (MEMBER_EXISTS("cfs_rq", "h_nr_running")) { + cfs_rq_nr_running = ULONG(runqbuf + + OFFSET(rq_cfs) + + OFFSET(cfs_rq_h_nr_running)); + } else { + cfs_rq_nr_running = ULONG(runqbuf + + OFFSET(rq_cfs) + + OFFSET(cfs_rq_nr_running)); + } root = (struct rb_root *)(runq + OFFSET(rq_cfs) + OFFSET(cfs_rq_tasks_timeline)); } Thanks, Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility