----- Original Message ----- > Hello Dave, > > Sorry about not testing the patch fully enough. And I think we > should make a discussion about the first patch. I have done some > tests with the patch, and I attached it. So could you please test > it in your box again. Hello Zhang, I applied only your new patch1 (the old patch2 no longer applies after this new patch1), and I see this: $ make warn ... cc -c -g -DX86_64 -DGDB_7_3_1 task.c -Wall -O2 -Wstrict-prototypes -Wmissing-prototypes -fstack-protector task.c: In function ‘dump_CFS_runqueues’: task.c:7693:6: warning: variable 'tot' set but not used [-Wunused-but-set-variable] ... And I still (always) see the same problem with a live kernel: crash> set PID: 25998 COMMAND: "crash" TASK: ffff88020fd9dc40 [THREAD_INFO: ffff88017b6d2000] CPU: 2 STATE: TASK_RUNNING (ACTIVE) crash> runq CPU 0 RUNQUEUE: ffff88021e213cc0 CURRENT: PID: 0 TASK: ffffffff81c13420 COMMAND: "swapper/0" RT PRIO_ARRAY: ffff88021e213e28 [no tasks queued] CFS RB_ROOT: ffff88021e213d58 GROUP CFS RB_ROOT: ffff88020ec3b800runq: invalid kernel virtual address: 48 type: "cgroup dentry" crash> I also still see numerous instances of the error above with some (but not all) of my "snapshot" dumpfiles, where your dump_task_group_name() function is encountering (and trying to use) a NULL cgroup address here: static void dump_task_group_name(ulong group) { ulong cgroup, dentry, name; char *dentry_buf; int len; char tmp_buf[100]; readmem(group + OFFSET(task_group_css) + OFFSET(cgroup_subsys_state_cgroup), KVADDR, &cgroup, sizeof(ulong), "task_group css cgroup", FAULT_ON_ERROR); readmem(cgroup + OFFSET(cgroup_dentry), KVADDR, &dentry, sizeof(ulong), "cgroup dentry", FAULT_ON_ERROR); Here are the examples, where it always happens on the "crash" process while it's performing the snapshot file creation: 2.6.38.2-9.fc15 snapshot: crash> runq CPU 0 RUNQUEUE: ffff88003fc13840 CURRENT: PID: 1180 TASK: ffff88003bea2e40 COMMAND: "crash" RT PRIO_ARRAY: ffff88003fc13988 [no tasks queued] CFS RB_ROOT: ffff88003fc138d8 GROUP CFS RB_ROOT: ffff880037ef1b00runq: invalid kernel virtual address: 38 type: "cgroup dentry" crash> 2.6.40.4-5.fc15 snapshot: crash> runq ... CPU 1 RUNQUEUE: ffff88003fc92540 CURRENT: PID: 1341 TASK: ffff880037409730 COMMAND: "crash" RT PRIO_ARRAY: ffff88003fc92690 [no tasks queued] CFS RB_ROOT: ffff88003fc925d8 GROUP CFS RB_ROOT: ffff880037592f00runq: invalid kernel virtual address: 38 type: "cgroup dentry" crash> 3.5.1-1.fc17 snapshot: crash> runq ... CPU 1 RUNQUEUE: ffff88003ed13800 CURRENT: PID: 31736 TASK: ffff88007c46ae20 COMMAND: "crash" RT PRIO_ARRAY: ffff88003ed13968 [no tasks queued] CFS RB_ROOT: ffff88003ed13898 GROUP CFS RB_ROOT: ffff88003deb3000runq: invalid kernel virtual address: 48 type: "cgroup dentry" crash> 3.1.7-1.fc16 snapshot: crash> runq ... CPU 2 RUNQUEUE: ffff88003e253180 CURRENT: PID: 1495 TASK: ffff880037a60000 COMMAND: "crash" RT PRIO_ARRAY: ffff88003e2532d0 [no tasks queued] CFS RB_ROOT: ffff88003e253218 GROUP CFS RB_ROOT: ffff8800277f8500runq: invalid kernel virtual address: 38 type: "cgroup dentry" crash> 3.2.6-3.fc16 snapshot: crash> runq ... CPU 0 RUNQUEUE: ffff88003fc13780 CURRENT: PID: 1383 TASK: ffff88003c932e40 COMMAND: "crash" RT PRIO_ARRAY: ffff88003fc13910 [no tasks queued] CFS RB_ROOT: ffff88003fc13820 GROUP CFS RB_ROOT: ffff88003a432c00runq: invalid kernel virtual address: 38 type: "cgroup dentry" crash> But I also saw the error above on this 3.2.1-0.8.el7.x86_64 kernel that actually crashed: crash> runq ... CPU 3 RUNQUEUE: ffff8804271d43c0 CURRENT: PID: 11615 TASK: ffff88020c50a670 COMMAND: "runtest.sh" RT PRIO_ARRAY: ffff8804271d4590 [no tasks queued] CFS RB_ROOT: ffff8804271d44a0 GROUP CFS RB_ROOT: ffff88041e0d2760runq: invalid kernel virtual address: 38 type: "cgroup dentry" crash> > will be fixed in patch2 later. With respect to your patch2: +#define MAX_THROTTLED_RQ 100 +struct throttled_rq { + ulong rq; + int depth; + int prio; +}; +static struct throttled_rq throttled_rt_rq_array[MAX_THROTTLED_RQ]; +static struct throttled_rq throttled_cfs_rq_array[MAX_THROTTLED_RQ]; Can you please dynamically allocate the throttled_rt_rq_array and throttled_cfs_rq_array arrays with GETBUF(), perhaps in the task_group_offset_init() function? They are only needed when "runq" is executed, and then only if the kernel version supports them. You can FREEBUF() them at the bottom of dump_CFS_runqueues(), and if the command fails prematurely, they will be FREEBUF()'d automatically by restore_sanity(). But this leads to the larger question of showing the task_group data. Consider that the current "runq" command does what it says it does: crash> help runq NAME runq - run queue SYNOPSIS runq [-t] DESCRIPTION With no argument, this command displays the tasks on the run queues of each cpu. -t Display the timestamp information of each cpu's runqueue, which is the rq.clock, rq.most_recent_timestamp or rq.timestamp_last_tick value, whichever applies; following each cpu timestamp is the last_run or timestamp value of the active task on that cpu, whichever applies, along with the task identification. ... Now, your patch adds signficant complexity to the runq handling code and to its future maintainability. I'm wondering whether your patch can be modified such that the task_group info would only be displayed via a new flag, let's say "runq -g". It seems that there has been considerable churn in the kernel code in this area, and it worries me that this patch will potentially and unnecessarily cause the breakage of the simple display of the queued tasks. Thanks, Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility