Re: [PATCH] runq: make tasks in throttled cfs_rqs/rt_rqs displayed

zhangyanfei <zhangyanfei@xxxxxxxxxxxxxx> · Fri, 09 Nov 2012 11:37:11 +0800

于 2012年11月08日 03:15, Dave Anderson 写道:
> 
> 
> ----- Original Message -----
>>
>> ok. I rewrite the patch and they are tested ok in my box.
>>
>> Thanks
>> Zhang
> 
> My tests weren't so successful this time, and I also have some questions
> about the runq -g output.
> 
> I tested your latest patches on a sample set of 70 dumpfiles whose
> kernels all use CFS runqueues.  In 7 of the 70 "runq -g" tests,
> the command caused the crash session to fail like so:
> 

<snip>

> 
> In a quick debugging session of your free_task_group_info_array()
> I printed out the addresses being FREEBUF()'d, and I noted that 
> there were numerous instances of the same address being free twice:
> 
>  static void
>  free_task_group_info_array(void)
>  {
>          int i;
>  
>          for (i = 0; i < tgi_p; i++) {
>                  if (tgi_array[i]->name)
>                          FREEBUF(tgi_array[i]->name);
>                  FREEBUF(tgi_array[i]);
>          }
>          tgi_p = 0;
>          FREEBUF(tgi_array);
>  }
>  
> I put one of the failing vmlinux/vmcore pairs here for you
> to debug:
>   
>   http://people.redhat.com/anderson/zhangyanfei
> 

This is so weird. In my test on the vmcore you provided, 'runq -g' ran well
for the first time and caused the crash session to fail the next time.
>From the debug information above and from my tests, I noticed that it always
failed on the same place when FREEBUF a name. So I checked the function
get_task_group_name and changed the way to return a name buf. Now the command
works well on the vmcore.

> 
> Secondly, another question I have is the meaning of the command's output.
> 
> First, consider this "runq" output:
> 
>  crash> runq
>  CPU 0 RUNQUEUE: ffff8800090436c0
>    CURRENT: PID: 588    TASK: ffff88007e4877a0  COMMAND: "udevd"
>    RT PRIO_ARRAY: ffff8800090437c8
>       [no tasks queued]
>    CFS RB_ROOT: ffff880009043740
>       [118] PID: 2110   TASK: ffff88007d470860  COMMAND: "check-cdrom.sh"
>       [118] PID: 2109   TASK: ffff88007f1247a0  COMMAND: "check-cdrom.sh"
>       [118] PID: 2114   TASK: ffff88007f20e080  COMMAND: "udevd"
>  
>  CPU 1 RUNQUEUE: ffff88000905b6c0
>    CURRENT: PID: 2113   TASK: ffff88007e8ac140  COMMAND: "udevd"
>    RT PRIO_ARRAY: ffff88000905b7c8
>       [no tasks queued]
>    CFS RB_ROOT: ffff88000905b740
>       [118] PID: 2092   TASK: ffff88007d7a4760  COMMAND: "MAKEDEV"
>       [118] PID: 1983   TASK: ffff88007e59f140  COMMAND: "udevd"
>       [118] PID: 2064   TASK: ffff88007e40f7a0  COMMAND: "udevd"
>       [115] PID: 2111   TASK: ffff88007e4278a0  COMMAND: "kthreadd"
>  crash>
> 
> In the above case, the per-cpu "rq" structure addresses are shown as:
> 
>  CPU 0 RUNQUEUE: ffff8800090436c0
>  CPU 1 RUNQUEUE: ffff88000905b6c0
> 
> And embedded in each of the rq structures above are these two rb_root
> structures:
> 
>    CFS RB_ROOT: ffff880009043740  (embedded in rq @ffff8800090436c0)
>    CFS RB_ROOT: ffff88000905b740  (embedded in rq @ffff88000905b6c0)
> 
> And starting at those rb_root structures, the tree of tasks are dumped.
> 
> Now, your "runq -q" option doesn't show any "starting point" structure
> address, but rather they just show "CPU 0" and "CPU 1":
>  
>  crash> runq -g
>  CPU 0
>    CURRENT: PID: 588    TASK: ffff88007e4877a0  COMMAND: "udevd"
>    RT PRIO_ARRAY: ffff8800090437c8
>       [no tasks queued]
>    CFS RB_ROOT: ffff880009093548
>       [118] PID: 2110   TASK: ffff88007d470860  COMMAND: "check-cdrom.sh"
>       [118] PID: 2109   TASK: ffff88007f1247a0  COMMAND: "check-cdrom.sh"
>       [118] PID: 2114   TASK: ffff88007f20e080  COMMAND: "udevd"
>  
>  CPU 1
>    CURRENT: PID: 2113   TASK: ffff88007e8ac140  COMMAND: "udevd"
>    RT PRIO_ARRAY: ffff88000905b7c8
>       [no tasks queued]
>    CFS RB_ROOT: ffff880009093548
>       [118] PID: 2092   TASK: ffff88007d7a4760  COMMAND: "MAKEDEV"
>       [118] PID: 1983   TASK: ffff88007e59f140  COMMAND: "udevd"
>       [118] PID: 2064   TASK: ffff88007e40f7a0  COMMAND: "udevd"
>       [115] PID: 2111   TASK: ffff88007e4278a0  COMMAND: "kthreadd"
>  crash> 
>  
> I would think that there might be a useful address of a per-cpu 
> structure that could be shown there as well?

OK, this is added.

> 
> And secondly, I'm confused as to why the "CFS RB_ROOT" address for
> all cpus is the same address -- for example, above they are both at
> ffff880009093548.  How can the two rb trees have the same rb_root?

My neglect, sorry. fixed.

Thanks
Zhang
>From 923b6bf30502bd3cfecf4c0f4d41fdc5618825d1 Mon Sep 17 00:00:00 2001
From: Zhang Yanfei <zhangyanfei@xxxxxxxxxxxxxx>
Date: Fri, 9 Nov 2012 11:19:08 +0800
Subject: [PATCH 1/2] add -g option for runq v5

Signed-off-by: Zhang Yanfei <zhangyanfei@xxxxxxxxxxxxxx>
---
 defs.h    |   18 ++
 kernel.c  |    3 +
 symbols.c |   36 ++++
 task.c    |  681 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 724 insertions(+), 14 deletions(-)

diff --git a/defs.h b/defs.h
index 319584f..ce23f58 100755
--- a/defs.h
+++ b/defs.h
@@ -1792,6 +1792,23 @@ struct offset_table {                    /* stash of commonly-used offsets */
 	long sched_rt_entity_my_q;
 	long neigh_table_hash_shift;
 	long neigh_table_nht_ptr;
+	long task_group_parent;
+	long task_group_css;
+	long cgroup_subsys_state_cgroup;
+	long cgroup_dentry;
+	long task_group_rt_rq;
+	long rt_rq_tg;
+	long rt_rq_rq;
+	long task_group_cfs_rq;
+	long cfs_rq_tg;
+	long task_group_siblings;
+	long task_group_children;
+	long task_group_cfs_bandwidth;
+	long cfs_rq_throttled;
+	long task_group_rt_bandwidth;
+	long rt_rq_rt_throttled;
+	long rt_rq_highest_prio;
+	long rt_rq_rt_nr_running;
 };
 
 struct size_table {         /* stash of commonly-used sizes */
@@ -1927,6 +1944,7 @@ struct size_table {         /* stash of commonly-used sizes */
 	long log;
 	long log_level;
 	long rt_rq;
+	long task_group;
 };
 
 struct array_table {
diff --git a/kernel.c b/kernel.c
index 45da48e..76441e9 100755
--- a/kernel.c
+++ b/kernel.c
@@ -308,6 +308,9 @@ kernel_init()
 	STRUCT_SIZE_INIT(prio_array, "prio_array"); 
 
 	MEMBER_OFFSET_INIT(rq_cfs, "rq", "cfs");
+	MEMBER_OFFSET_INIT(task_group_cfs_rq, "task_group", "cfs_rq");
+	MEMBER_OFFSET_INIT(task_group_rt_rq, "task_group", "rt_rq");
+	MEMBER_OFFSET_INIT(task_group_parent, "task_group", "parent");
 
        /*
         *  In 2.4, smp_send_stop() sets smp_num_cpus back to 1
diff --git a/symbols.c b/symbols.c
index 1f09c9f..d3f93e1 100755
--- a/symbols.c
+++ b/symbols.c
@@ -8820,6 +8820,40 @@ dump_offset_table(char *spec, ulong makestruct)
 		OFFSET(log_flags_level));
 	fprintf(fp, "          sched_rt_entity_my_q: %ld\n",
 		OFFSET(sched_rt_entity_my_q));
+	fprintf(fp, "             task_group_parent: %ld\n",
+		OFFSET(task_group_parent));
+	fprintf(fp, "                task_group_css: %ld\n",
+		OFFSET(task_group_css));
+	fprintf(fp, "    cgroup_subsys_state_cgroup: %ld\n",
+		OFFSET(cgroup_subsys_state_cgroup));
+	fprintf(fp, "                 cgroup_dentry: %ld\n",
+		OFFSET(cgroup_dentry));
+	fprintf(fp, "              task_group_rt_rq: %ld\n",
+		OFFSET(task_group_rt_rq));
+	fprintf(fp, "                      rt_rq_tg: %ld\n",
+		OFFSET(rt_rq_tg));
+	fprintf(fp, "                      rt_rq_rq: %ld\n",
+		OFFSET(rt_rq_rq));
+	fprintf(fp, "             task_group_cfs_rq: %ld\n",
+		OFFSET(task_group_cfs_rq));
+	fprintf(fp, "                     cfs_rq_tg: %ld\n",
+		OFFSET(cfs_rq_tg));
+	fprintf(fp, "           task_group_siblings: %ld\n",
+		OFFSET(task_group_siblings));
+	fprintf(fp, "           task_group_children: %ld\n",
+		OFFSET(task_group_children));
+	fprintf(fp, "      task_group_cfs_bandwidth: %ld\n",
+		OFFSET(task_group_cfs_bandwidth));
+	fprintf(fp, "              cfs_rq_throttled: %ld\n",
+		OFFSET(cfs_rq_throttled));
+	fprintf(fp, "       task_group_rt_bandwidth: %ld\n",
+		OFFSET(task_group_rt_bandwidth));
+	fprintf(fp, "            rt_rq_rt_throttled: %ld\n",
+		OFFSET(rt_rq_rt_throttled));
+	fprintf(fp, "            rt_rq_highest_prio: %ld\n",
+		OFFSET(rt_rq_highest_prio));
+	fprintf(fp, "           rt_rq_rt_nr_running: %ld\n",
+		OFFSET(rt_rq_rt_nr_running));
 
 	fprintf(fp, "\n                    size_table:\n");
 	fprintf(fp, "                          page: %ld\n", SIZE(page));
@@ -9037,6 +9071,8 @@ dump_offset_table(char *spec, ulong makestruct)
 		SIZE(log_level));
 	fprintf(fp, "                         rt_rq: %ld\n",
 		SIZE(rt_rq));
+	fprintf(fp, "                    task_group: %ld\n",
+		SIZE(task_group));
 
         fprintf(fp, "\n                   array_table:\n");
 	/*
diff --git a/task.c b/task.c
index f8c6325..6c7eb91 100755
--- a/task.c
+++ b/task.c
@@ -64,10 +64,27 @@ static struct rb_node *rb_parent(struct rb_node *, struct rb_node *);
 static struct rb_node *rb_right(struct rb_node *, struct rb_node *);
 static struct rb_node *rb_left(struct rb_node *, struct rb_node *);
 static void dump_task_runq_entry(struct task_context *);
+static void print_group_header_fair(int, ulong, void *);
+static void print_parent_task_group_fair(void *, int);
+static int dump_tasks_in_lower_dequeued_cfs_rq(int, ulong, int);
 static int dump_tasks_in_cfs_rq(ulong);
+static int dump_tasks_in_task_group_cfs_rq(int, ulong, int);
 static void dump_on_rq_tasks(void);
+static void cfs_rq_offset_init(void);
+static void task_group_offset_init(void);
 static void dump_CFS_runqueues(void);
+static void print_group_header_rt(ulong, void *);
+static void print_parent_task_group_rt(void *, int);
+static int dump_tasks_in_lower_dequeued_rt_rq(int, ulong, int);
 static void dump_RT_prio_array(int, ulong, char *);
+static void dump_tasks_in_task_group_rt_rq(int, ulong, int);
+static char *get_task_group_name(ulong);
+static void sort_task_group_info_array(void);
+static void print_task_group_info_array(void);
+static void reuse_task_group_info_array(void);
+static void free_task_group_info_array(void);
+static void fill_task_group_info_array(int, ulong, char *, int);
+static void dump_tasks_by_task_group(void);
 static void task_struct_member(struct task_context *,unsigned int, struct reference *);
 static void signal_reference(struct task_context *, ulong, struct reference *);
 static void do_sig_thread_group(ulong);
@@ -7028,8 +7045,9 @@ cmd_runq(void)
         int c;
 	int sched_debug = 0;
 	int dump_timestamp_flag = 0;
+	int dump_task_group_flag = 0;
 
-        while ((c = getopt(argcnt, args, "dt")) != EOF) {
+        while ((c = getopt(argcnt, args, "dtg")) != EOF) {
                 switch(c)
                 {
 		case 'd':
@@ -7038,6 +7056,13 @@ cmd_runq(void)
 		case 't':
 			dump_timestamp_flag = 1;
 			break;
+		case 'g':
+			if (INVALID_MEMBER(task_group_cfs_rq) ||
+			    INVALID_MEMBER(task_group_rt_rq) ||
+			    INVALID_MEMBER(task_group_parent))
+				option_not_supported(c);
+			dump_task_group_flag = 1;
+			break;
                 default:
                         argerrs++;
                         break;
@@ -7053,12 +7078,16 @@ cmd_runq(void)
                 return;
         }
 
-
 	if (sched_debug) {
 		dump_on_rq_tasks();
 		return;
 	}
 
+	if (dump_task_group_flag) {
+		dump_tasks_by_task_group();
+		return;
+	}
+
 	dump_runq();
 }
 
@@ -7421,6 +7450,80 @@ rb_next(struct rb_node *node)
         return parent;
 }
 
+#define MAX_GROUP_NUM 200
+struct task_group_info {
+	int use;
+	int depth;
+	char *name;
+	ulong task_group;
+	struct task_group_info *parent;
+};
+
+static struct task_group_info **tgi_array;
+static int tgi_p = 0;
+
+static void
+sort_task_group_info_array(void)
+{
+	int i, j;
+	struct task_group_info *tmp;
+
+	for (i = 0; i < tgi_p - 1; i++) {
+		for (j = 0; j < tgi_p - i - 1; j++) {
+			if (tgi_array[j]->depth > tgi_array[j+1]->depth) {
+				tmp = tgi_array[j+1];
+				tgi_array[j+1] = tgi_array[j];
+				tgi_array[j] = tmp;
+			}
+		}
+	}
+}
+
+static void
+print_task_group_info_array(void)
+{
+	int i;
+
+	for (i = 0; i < tgi_p; i++) {
+		fprintf(fp, "%d : use=%d, depth=%d, group=%lx, ", i,
+			tgi_array[i]->use, tgi_array[i]->depth,
+			tgi_array[i]->task_group);
+		fprintf(fp, "name=%s, ",
+			tgi_array[i]->name ? tgi_array[i]->name : "NULL");
+		if (tgi_array[i]->parent)
+			fprintf(fp, "parent=%lx",
+				tgi_array[i]->parent->task_group);
+		fprintf(fp, "\n");
+	}
+}
+
+static void
+free_task_group_info_array(void)
+{
+	int i;
+
+	for (i = 0; i < tgi_p; i++) {
+		if (tgi_array[i]->name)
+			FREEBUF(tgi_array[i]->name);
+		FREEBUF(tgi_array[i]);
+	}
+	tgi_p = 0;
+	FREEBUF(tgi_array);
+}
+
+static void
+reuse_task_group_info_array(void)
+{
+	int i;
+
+	for (i = 0; i < tgi_p; i++) {
+		if (tgi_array[i]->depth == 0)
+			tgi_array[i]->use = 0;
+		else
+			tgi_array[i]->use = 1;
+	}
+}
+
 static void
 dump_task_runq_entry(struct task_context *tc)
 {
@@ -7428,11 +7531,98 @@ dump_task_runq_entry(struct task_context *tc)
 
 	readmem(tc->task + OFFSET(task_struct_prio), KVADDR, 
 		&prio, sizeof(int), "task prio", FAULT_ON_ERROR);
-	fprintf(fp, "     [%3d] ", prio);
+	fprintf(fp, "[%3d] ", prio);
 	fprintf(fp, "PID: %-5ld  TASK: %lx  COMMAND: \"%s\"\n",
 		tc->pid, tc->task, tc->comm);
 }
 
+static void
+print_group_header_fair(int depth, ulong cfs_rq, void *t)
+{
+	int throttled;
+	struct rb_root *root;
+	struct task_group_info *tgi = (struct task_group_info *)t;
+
+	root = (struct rb_root *)(cfs_rq + OFFSET(cfs_rq_tasks_timeline));
+	INDENT(2 + 3 * depth);
+	fprintf(fp, "GROUP CFS RB_ROOT: %lx", (ulong)root);
+	if (tgi->name)
+		fprintf(fp, " <%s>", tgi->name);
+
+	if (VALID_MEMBER(task_group_cfs_bandwidth)) {
+		readmem(cfs_rq + OFFSET(cfs_rq_throttled), KVADDR,
+			&throttled, sizeof(int), "cfs_rq throttled",
+			FAULT_ON_ERROR);
+		if (throttled)
+			fprintf(fp, " (THROTTLED)");
+	}
+	fprintf(fp, "\n");
+}
+
+static void
+print_parent_task_group_fair(void *t, int cpu)
+{
+	struct task_group_info *tgi;
+	ulong cfs_rq_c, cfs_rq_p;
+
+	tgi = ((struct task_group_info *)t)->parent;
+	if (tgi && tgi->use)
+		print_parent_task_group_fair(tgi, cpu);
+	else
+		return;
+
+	readmem(tgi->task_group + OFFSET(task_group_cfs_rq),
+		KVADDR, &cfs_rq_c, sizeof(ulong),
+		"task_group cfs_rq", FAULT_ON_ERROR);
+	readmem(cfs_rq_c + cpu * sizeof(ulong), KVADDR, &cfs_rq_p,
+		sizeof(ulong), "task_group cfs_rq", FAULT_ON_ERROR);
+
+	print_group_header_fair(tgi->depth, cfs_rq_p, tgi);
+	tgi->use = 0;
+}
+
+static int
+dump_tasks_in_lower_dequeued_cfs_rq(int depth, ulong cfs_rq, int cpu)
+{
+	int i, total, nr_running;
+	ulong group, cfs_rq_c, cfs_rq_p;
+
+	total = 0;
+	for (i = 0; i < tgi_p; i++) {
+		if (tgi_array[i]->use == 0 || tgi_array[i]->depth - depth != 1)
+			continue;
+
+		readmem(cfs_rq + OFFSET(cfs_rq_tg), KVADDR, &group,
+			sizeof(ulong), "cfs_rq tg", FAULT_ON_ERROR);
+		if (group != tgi_array[i]->parent->task_group)
+			continue;
+
+		readmem(tgi_array[i]->task_group + OFFSET(task_group_cfs_rq),
+			KVADDR, &cfs_rq_c, sizeof(ulong), "task_group cfs_rq",
+			FAULT_ON_ERROR);
+		readmem(cfs_rq_c + cpu * sizeof(ulong), KVADDR, &cfs_rq_p,
+			sizeof(ulong), "task_group cfs_rq", FAULT_ON_ERROR);
+		if (cfs_rq == cfs_rq_p)
+			continue;
+
+		readmem(cfs_rq_p + OFFSET(cfs_rq_nr_running), KVADDR,
+			&nr_running, sizeof(int), "cfs_rq nr_running",
+			FAULT_ON_ERROR);
+		if (nr_running == 0) {
+			total += dump_tasks_in_lower_dequeued_cfs_rq(depth + 1,
+				cfs_rq_p, cpu);
+			continue;
+		}
+
+		print_parent_task_group_fair(tgi_array[i], cpu);
+
+		total++;
+		total += dump_tasks_in_task_group_cfs_rq(depth + 1, cfs_rq_p, cpu);
+	}
+
+	return total;
+}
+
 static int
 dump_tasks_in_cfs_rq(ulong cfs_rq)
 {
@@ -7475,9 +7665,10 @@ dump_tasks_in_cfs_rq(ulong cfs_rq)
 				     OFFSET(sched_entity_run_node));
 		if (!tc)
 			continue;
-		if (hq_enter((ulong)tc))
+		if (hq_enter((ulong)tc)) {
+			INDENT(5);
 			dump_task_runq_entry(tc);
-		else {
+		} else {
 			error(WARNING, "duplicate CFS runqueue node: task %lx\n",
 				tc->task);
 			return total;
@@ -7488,6 +7679,87 @@ dump_tasks_in_cfs_rq(ulong cfs_rq)
 	return total;
 }
 
+static int
+dump_tasks_in_task_group_cfs_rq(int depth, ulong cfs_rq, int cpu)
+{
+	struct task_context *tc;
+	struct rb_root *root;
+	struct rb_node *node;
+	ulong my_q, leftmost, curr, curr_my_q, tg;
+	int total, i;
+
+	total = 0;
+
+	if (depth) {
+		readmem(cfs_rq + OFFSET(cfs_rq_tg), KVADDR,
+			&tg, sizeof(ulong), "cfs_rq tg",
+			FAULT_ON_ERROR);
+		for (i = 0; i < tgi_p; i++) {
+			if (tgi_array[i]->task_group == tg) {
+				print_group_header_fair(depth,
+					cfs_rq, tgi_array[i]);
+				tgi_array[i]->use = 0;
+				break;
+			}
+		}
+	}
+
+	if (VALID_MEMBER(sched_entity_my_q)) {
+		readmem(cfs_rq + OFFSET(cfs_rq_curr), KVADDR, &curr,
+			sizeof(ulong), "curr", FAULT_ON_ERROR);
+		if (curr) {
+			readmem(curr + OFFSET(sched_entity_my_q), KVADDR,
+				&curr_my_q, sizeof(ulong), "curr->my_q",
+				FAULT_ON_ERROR);
+			if (curr_my_q) {
+				total++;
+				total += dump_tasks_in_task_group_cfs_rq(depth + 1,
+					curr_my_q, cpu);
+			}
+		}
+	}
+
+	readmem(cfs_rq + OFFSET(cfs_rq_rb_leftmost), KVADDR, &leftmost,
+		sizeof(ulong), "rb_leftmost", FAULT_ON_ERROR);
+	root = (struct rb_root *)(cfs_rq + OFFSET(cfs_rq_tasks_timeline));
+
+	for (node = rb_first(root); leftmost && node; node = rb_next(node)) {
+		if (VALID_MEMBER(sched_entity_my_q)) {
+			readmem((ulong)node - OFFSET(sched_entity_run_node)
+				+ OFFSET(sched_entity_my_q), KVADDR, &my_q,
+				sizeof(ulong), "my_q", FAULT_ON_ERROR);
+			if (my_q) {
+				total++;
+				total += dump_tasks_in_task_group_cfs_rq(depth + 1,
+					my_q, cpu);
+				continue;
+			}
+		}
+
+		tc = task_to_context((ulong)node - OFFSET(task_struct_se) -
+				     OFFSET(sched_entity_run_node));
+		if (!tc)
+			continue;
+		if (hq_enter((ulong)tc)) {
+			INDENT(5 + 3 * depth);
+			dump_task_runq_entry(tc);
+		} else {
+			error(WARNING, "duplicate CFS runqueue node: task %lx\n",
+				tc->task);
+			return total;
+		}
+		total++;
+	}
+
+	total += dump_tasks_in_lower_dequeued_cfs_rq(depth, cfs_rq, cpu);
+
+	if (!total) {
+		INDENT(5 + 3 * depth);
+		fprintf(fp, "[no tasks queued]\n");
+	}
+	return total;
+}
+
 static void
 dump_on_rq_tasks(void)
 {
@@ -7531,6 +7803,7 @@ dump_on_rq_tasks(void)
 			if (!on_rq || tc->processor != cpu)
 				continue;
 
+			INDENT(5);
 			dump_task_runq_entry(tc);
 			tot++;
 		}
@@ -7543,16 +7816,8 @@ dump_on_rq_tasks(void)
 }
 
 static void
-dump_CFS_runqueues(void)
+cfs_rq_offset_init(void)
 {
-	int tot, cpu;
-	ulong runq, cfs_rq;
-	char *runqbuf, *cfs_rq_buf;
-	ulong tasks_timeline ATTRIBUTE_UNUSED;
-	struct task_context *tc;
-	struct rb_root *root;
-	struct syment *rq_sp, *init_sp;
-
 	if (!VALID_STRUCT(cfs_rq)) {
 		STRUCT_SIZE_INIT(cfs_rq, "cfs_rq");
 		STRUCT_SIZE_INIT(rt_rq, "rt_rq");
@@ -7585,6 +7850,50 @@ dump_CFS_runqueues(void)
 			"run_list");
 		MEMBER_OFFSET_INIT(rt_prio_array_queue, "rt_prio_array", "queue");
 	}
+}
+
+static void
+task_group_offset_init(void)
+{
+	if (!VALID_STRUCT(task_group)) {
+		STRUCT_SIZE_INIT(task_group, "task_group");
+		MEMBER_OFFSET_INIT(rt_rq_rt_nr_running, "rt_rq", "rt_nr_running");
+		MEMBER_OFFSET_INIT(cfs_rq_tg, "cfs_rq", "tg");
+		MEMBER_OFFSET_INIT(rt_rq_rq, "rt_rq", "rq");
+		MEMBER_OFFSET_INIT(rt_rq_tg, "rt_rq", "tg");
+		MEMBER_OFFSET_INIT(rt_rq_highest_prio, "rt_rq", "highest_prio");
+		MEMBER_OFFSET_INIT(task_group_css, "task_group", "css");
+		MEMBER_OFFSET_INIT(cgroup_subsys_state_cgroup,
+			"cgroup_subsys_state", "cgroup");
+		MEMBER_OFFSET_INIT(cgroup_dentry, "cgroup", "dentry");
+
+		MEMBER_OFFSET_INIT(task_group_siblings, "task_group", "siblings");
+		MEMBER_OFFSET_INIT(task_group_children, "task_group", "children");
+
+		MEMBER_OFFSET_INIT(task_group_cfs_bandwidth,
+			"task_group", "cfs_bandwidth");
+		MEMBER_OFFSET_INIT(cfs_rq_throttled, "cfs_rq",
+			"throttled");
+
+		MEMBER_OFFSET_INIT(task_group_rt_bandwidth,
+			"task_group", "rt_bandwidth");
+		MEMBER_OFFSET_INIT(rt_rq_rt_throttled, "rt_rq",
+			"rt_throttled");
+	}
+}
+
+static void
+dump_CFS_runqueues(void)
+{
+	int cpu, tot;
+	ulong runq, cfs_rq;
+	char *runqbuf, *cfs_rq_buf;
+	ulong tasks_timeline ATTRIBUTE_UNUSED;
+	struct task_context *tc;
+	struct rb_root *root;
+	struct syment *rq_sp, *init_sp;
+
+	cfs_rq_offset_init();
 
 	if (!(rq_sp = per_cpu_symbol_search("per_cpu__runqueues")))
 		error(FATAL, "per-cpu runqueues do not exist\n");
@@ -7643,6 +7952,7 @@ dump_CFS_runqueues(void)
 		hq_open();
 		tot = dump_tasks_in_cfs_rq(cfs_rq);
 		hq_close();
+
 		if (!tot) {
 			INDENT(5);
 			fprintf(fp, "[no tasks queued]\n");
@@ -7655,6 +7965,106 @@ dump_CFS_runqueues(void)
 }
 
 static void
+print_group_header_rt(ulong rt_rq, void *t)
+{
+	int throttled;
+	struct task_group_info *tgi = (struct task_group_info *)t;
+
+	fprintf(fp, "GROUP RT PRIO_ARRAY: %lx", rt_rq + OFFSET(rt_rq_active));
+	if (tgi->name)
+		fprintf(fp, " <%s>", tgi->name);
+
+	if (VALID_MEMBER(task_group_rt_bandwidth)) {
+		readmem(rt_rq + OFFSET(rt_rq_rt_throttled), KVADDR,
+			&throttled, sizeof(int), "rt_rq rt_throttled",
+			FAULT_ON_ERROR);
+		if (throttled)
+			fprintf(fp, " (THROTTLED)");
+	}
+	fprintf(fp, "\n");
+}
+
+static void
+print_parent_task_group_rt(void *t, int cpu)
+{
+	int prio;
+	struct task_group_info *tgi;
+	ulong rt_rq_c, rt_rq_p;
+
+
+	tgi = ((struct task_group_info *)t)->parent;
+	if (tgi && tgi->use)
+		print_parent_task_group_fair(tgi, cpu);
+	else
+		return;
+
+	readmem(tgi->task_group + OFFSET(task_group_rt_rq),
+		KVADDR, &rt_rq_c, sizeof(ulong),
+		"task_group rt_rq", FAULT_ON_ERROR);
+	readmem(rt_rq_c + cpu * sizeof(ulong), KVADDR, &rt_rq_p,
+		sizeof(ulong), "task_group rt_rq", FAULT_ON_ERROR);
+
+	readmem(rt_rq_p + OFFSET(rt_rq_highest_prio), KVADDR, &prio,
+		sizeof(int), "rt_rq highest prio", FAULT_ON_ERROR);
+
+	INDENT(-1 + 6 * tgi->depth);
+	fprintf(fp, "[%3d] ", prio);
+	print_group_header_rt(rt_rq_p, tgi);
+	tgi->use = 0;
+}
+
+static int
+dump_tasks_in_lower_dequeued_rt_rq(int depth, ulong rt_rq, int cpu)
+{
+	int i, prio, tot, delta, nr_running;
+	ulong rt_rq_c, rt_rq_p, group;
+
+	tot = 0;
+	for (i = 0; i < tgi_p; i++) {
+		delta = tgi_array[i]->depth - depth;
+		if (delta > 1)
+			break;
+
+		if (tgi_array[i]->use == 0 || delta < 1)
+			continue;
+
+		readmem(rt_rq + OFFSET(rt_rq_tg), KVADDR, &group,
+			sizeof(ulong), "rt_rq tg", FAULT_ON_ERROR);
+		if (group != tgi_array[i]->parent->task_group)
+			continue;
+
+		readmem(tgi_array[i]->task_group + OFFSET(task_group_rt_rq),
+			KVADDR, &rt_rq_c, sizeof(ulong), "task_group rt_rq",
+			FAULT_ON_ERROR);
+		readmem(rt_rq_c + cpu * sizeof(ulong), KVADDR, &rt_rq_p,
+			sizeof(ulong), "task_group rt_rq", FAULT_ON_ERROR);
+		if (rt_rq == rt_rq_p)
+			continue;
+
+		readmem(rt_rq_p + OFFSET(rt_rq_rt_nr_running), KVADDR,
+			&nr_running, sizeof(int), "rt_rq rt_nr_running",
+			FAULT_ON_ERROR);
+		if (nr_running == 0) {
+			tot += dump_tasks_in_lower_dequeued_rt_rq(depth + 1,
+				rt_rq_p, cpu);
+			continue;
+		}
+
+		print_parent_task_group_rt(tgi_array[i], cpu);
+
+		readmem(rt_rq_p + OFFSET(rt_rq_highest_prio), KVADDR,
+			&prio, sizeof(int), "rt_rq highest_prio",
+			FAULT_ON_ERROR);
+		INDENT(5 + 6 * depth);
+		fprintf(fp, "[%3d] ", prio);
+		tot++;
+		dump_tasks_in_task_group_rt_rq(depth + 1, rt_rq_p, cpu);
+	}
+
+	return tot;
+}
+
+static void
 dump_RT_prio_array(int depth, ulong k_prio_array, char *u_prio_array)
 {
 	int i, c, tot, cnt, qheads;
@@ -7742,6 +8152,249 @@ dump_RT_prio_array(int depth, ulong k_prio_array, char *u_prio_array)
 	}
 }
 
+static void
+dump_tasks_in_task_group_rt_rq(int depth, ulong rt_rq, int cpu)
+{
+	int i, c, tot, cnt, qheads;
+	ulong offset, kvaddr, uvaddr;
+	ulong list_head[2];
+        struct list_data list_data, *ld;
+	struct task_context *tc;
+	ulong *tlist;
+	ulong my_q, task_addr, tg, k_prio_array;
+	char *rt_rq_buf, *u_prio_array;
+
+	k_prio_array = rt_rq +  OFFSET(rt_rq_active);
+	rt_rq_buf = GETBUF(SIZE(rt_rq));
+	readmem(rt_rq, KVADDR, rt_rq_buf, SIZE(rt_rq), "rt_rq", FAULT_ON_ERROR);
+	u_prio_array = &rt_rq_buf[OFFSET(rt_rq_active)];
+
+	if (depth) {
+		readmem(rt_rq + OFFSET(rt_rq_tg), KVADDR,
+			&tg, sizeof(ulong), "rt_rq tg",
+			FAULT_ON_ERROR);
+		for (i = 0; i < tgi_p; i++) {
+			if (tgi_array[i]->task_group == tg) {
+				print_group_header_rt(rt_rq, tgi_array[i]);
+				tgi_array[i]->use = 0;
+				break;
+			}
+		}
+	}
+
+        qheads = (i = ARRAY_LENGTH(rt_prio_array_queue)) ?
+                i : get_array_length("rt_prio_array.queue", NULL, SIZE(list_head));
+
+	ld = &list_data;
+
+	for (i = tot = 0; i < qheads; i++) {
+		offset =  OFFSET(rt_prio_array_queue) + (i * SIZE(list_head));
+		kvaddr = k_prio_array + offset;
+		uvaddr = (ulong)u_prio_array + offset;
+		BCOPY((char *)uvaddr, (char *)&list_head[0], sizeof(ulong)*2);
+
+		if (CRASHDEBUG(1))
+			fprintf(fp, "rt_prio_array[%d] @ %lx => %lx/%lx\n",
+				i, kvaddr, list_head[0], list_head[1]);
+
+		if ((list_head[0] == kvaddr) && (list_head[1] == kvaddr))
+			continue;
+
+		BZERO(ld, sizeof(struct list_data));
+		ld->start = list_head[0];
+		if (VALID_MEMBER(task_struct_rt) &&
+		    VALID_MEMBER(sched_rt_entity_run_list))
+			ld->list_head_offset = OFFSET(sched_rt_entity_run_list);
+		else
+			ld->list_head_offset = OFFSET(task_struct_run_list);
+		ld->end = kvaddr;
+		hq_open();
+		cnt = do_list(ld);
+		hq_close();
+		tlist = (ulong *)GETBUF((cnt) * sizeof(ulong));
+		cnt = retrieve_list(tlist, cnt);
+		for (c = 0; c < cnt; c++) {
+			task_addr = tlist[c];
+			if (INVALID_MEMBER(sched_rt_entity_my_q))
+				goto is_task;
+
+			readmem(tlist[c] + OFFSET(sched_rt_entity_my_q),
+				KVADDR, &my_q, sizeof(ulong), "my_q",
+				FAULT_ON_ERROR);
+			if (!my_q) {
+				task_addr -= OFFSET(task_struct_rt);
+				goto is_task;
+			}
+
+			INDENT(5 + 6 * depth);
+			fprintf(fp, "[%3d] ", i);
+			tot++;
+			dump_tasks_in_task_group_rt_rq(depth + 1, my_q, cpu);
+			continue;
+
+is_task:
+			if (!(tc = task_to_context(task_addr)))
+				continue;
+
+			INDENT(5 + 6 * depth);
+			fprintf(fp, "[%3d] ", i);
+			fprintf(fp, "PID: %-5ld  TASK: %lx  COMMAND: \"%s\"\n",
+				tc->pid, tc->task, tc->comm);
+			tot++;
+		}
+		FREEBUF(tlist);
+	}
+
+	tot += dump_tasks_in_lower_dequeued_rt_rq(depth, rt_rq, cpu);
+
+	if (!tot) {
+		INDENT(5 + 6 * depth);
+		fprintf(fp, "[no tasks queued]\n");
+	}
+	FREEBUF(rt_rq_buf);
+}
+
+static char *
+get_task_group_name(ulong group)
+{
+	ulong cgroup, dentry, name;
+	char *dentry_buf, *tmp;
+	int len;
+
+	tmp = NULL;
+	readmem(group + OFFSET(task_group_css) + OFFSET(cgroup_subsys_state_cgroup),
+		KVADDR, &cgroup, sizeof(ulong),
+		"task_group css cgroup", FAULT_ON_ERROR);
+	if (cgroup == 0)
+		return NULL;
+
+	readmem(cgroup + OFFSET(cgroup_dentry), KVADDR, &dentry, sizeof(ulong),
+		"cgroup dentry", FAULT_ON_ERROR);
+	if (dentry == 0)
+		return NULL;
+
+	dentry_buf = GETBUF(SIZE(dentry));
+	readmem(dentry, KVADDR, dentry_buf, SIZE(dentry),
+		"dentry", FAULT_ON_ERROR);
+	len = UINT(dentry_buf + OFFSET(dentry_d_name) + OFFSET(qstr_len));
+	tmp = GETBUF(len + 1);
+	name = ULONG(dentry_buf + OFFSET(dentry_d_name) + OFFSET(qstr_name));
+	readmem(name, KVADDR, tmp, len, "qstr name", FAULT_ON_ERROR);
+
+	FREEBUF(dentry_buf);
+	return tmp;
+}
+
+static void
+fill_task_group_info_array(int depth, ulong group, char *group_buf, int i)
+{
+	int d;
+	ulong kvaddr, uvaddr, offset;
+	ulong list_head[2], next;
+
+	d = tgi_p;
+	tgi_array[tgi_p] = (struct task_group_info *)
+		GETBUF(sizeof(struct task_group_info));
+	if (depth)
+		tgi_array[tgi_p]->use = 1;
+	else
+		tgi_array[tgi_p]->use = 0;
+
+	tgi_array[tgi_p]->depth = depth;
+	tgi_array[tgi_p]->name = get_task_group_name(group);
+	tgi_array[tgi_p]->task_group = group;
+	if (i >= 0)
+		tgi_array[tgi_p]->parent = tgi_array[i];
+	else
+		tgi_array[tgi_p]->parent = NULL;
+	tgi_p++;
+
+	offset = OFFSET(task_group_children);
+	kvaddr = group + offset;
+	uvaddr = (ulong)(group_buf + offset);
+	BCOPY((char *)uvaddr, (char *)&list_head[0], sizeof(ulong)*2);
+
+	if ((list_head[0] == kvaddr) && (list_head[1] == kvaddr))
+		return;
+
+	next = list_head[0];
+	while (next != kvaddr) {
+		group = next - OFFSET(task_group_siblings);
+		readmem(group, KVADDR, group_buf, SIZE(task_group),
+			"task_group", FAULT_ON_ERROR);
+		next = ULONG(group_buf + OFFSET(task_group_siblings) +
+			OFFSET(list_head_next));
+		fill_task_group_info_array(depth + 1, group, group_buf, d);
+	}
+}
+
+static void
+dump_tasks_by_task_group(void)
+{
+	int cpu;
+	ulong root_task_group, cfs_rq, cfs_rq_p;
+	ulong rt_rq, rt_rq_p, runq;
+	char *buf;
+	struct rb_root *root;
+	struct task_context *tc;
+
+	cfs_rq_offset_init();
+	task_group_offset_init();
+
+	root_task_group = 0;
+	if (symbol_exists("init_task_group"))
+		root_task_group = symbol_value("init_task_group");
+	else if (symbol_exists("root_task_group"))
+		root_task_group = symbol_value("root_task_group");
+	else
+		error(FATAL, "cannot determine root task_group\n");
+
+	tgi_array = (struct task_group_info **)GETBUF(sizeof(void *)
+		* MAX_GROUP_NUM);
+	buf = GETBUF(SIZE(task_group));
+	readmem(root_task_group, KVADDR, buf, SIZE(task_group),
+		"task_group", FAULT_ON_ERROR);
+	rt_rq = ULONG(buf + OFFSET(task_group_rt_rq));
+	cfs_rq = ULONG(buf + OFFSET(task_group_cfs_rq));
+
+	fill_task_group_info_array(0, root_task_group, buf, -1);
+	sort_task_group_info_array();
+	if (CRASHDEBUG(1))
+		print_task_group_info_array();
+
+	get_active_set();
+
+	for (cpu = 0; cpu < kt->cpus; cpu++) {
+		readmem(rt_rq + cpu * sizeof(ulong), KVADDR, &rt_rq_p,
+			sizeof(ulong), "task_group rt_rq", FAULT_ON_ERROR);
+		readmem(rt_rq_p + OFFSET(rt_rq_rq), KVADDR, &runq,
+			sizeof(ulong), "rt_rq rq", FAULT_ON_ERROR);
+		fprintf(fp, "%sCPU %d RUNQUEUE: %lx\n", cpu ? "\n" : "",
+			cpu, runq);
+		fprintf(fp, "  CURRENT: ");
+		if ((tc = task_to_context(tt->active_set[cpu])))
+			fprintf(fp, "PID: %-5ld  TASK: %lx  COMMAND: \"%s\"\n",
+				tc->pid, tc->task, tc->comm);
+		else
+			fprintf(fp, "%lx\n", tt->active_set[cpu]);
+
+		fprintf(fp, "  RT PRIO_ARRAY: %lx\n",
+			rt_rq_p + OFFSET(rt_rq_active));
+		reuse_task_group_info_array();
+		dump_tasks_in_task_group_rt_rq(0, rt_rq_p, cpu);
+
+		readmem(cfs_rq + cpu * sizeof(ulong), KVADDR, &cfs_rq_p,
+			sizeof(ulong), "task_group cfs_rq", FAULT_ON_ERROR);
+		root = (struct rb_root *)(cfs_rq_p + OFFSET(cfs_rq_tasks_timeline));
+		fprintf(fp, "  CFS RB_ROOT: %lx\n", (ulong)root);
+		reuse_task_group_info_array();
+		dump_tasks_in_task_group_cfs_rq(0, cfs_rq_p, cpu);
+	}
+
+	FREEBUF(buf);
+	free_task_group_info_array();
+}
+
 #undef _NSIG
 #define _NSIG           64
 #define _NSIG_BPW       machdep->bits
-- 
1.7.1

>From 802c4262110a6fb39ed7d5f2bdfe6133a35b3b75 Mon Sep 17 00:00:00 2001
From: Zhang Yanfei <zhangyanfei@xxxxxxxxxxxxxx>
Date: Tue, 6 Nov 2012 16:46:36 +0800
Subject: [PATCH 2/2] add help info for runq -g v2

Signed-off-by: Zhang Yanfei <zhangyanfei@xxxxxxxxxxxxxx>
---
 help.c |   42 ++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/help.c b/help.c
index 14bf533..3be3e6e 100755
--- a/help.c
+++ b/help.c
@@ -2201,7 +2201,7 @@ NULL
 char *help_runq[] = {
 "runq",
 "run queue",
-"[-t]",
+"[-t] [-g]",
 "  With no argument, this command displays the tasks on the run queues",
 "  of each cpu.",
 " ",
@@ -2209,7 +2209,9 @@ char *help_runq[] = {
 "       rq.clock, rq.most_recent_timestamp or rq.timestamp_last_tick value,",
 "       whichever applies; following each cpu timestamp is the last_run or ",
 "       timestamp value of the active task on that cpu, whichever applies, ",
-"       along with the task identification.", 
+"       along with the task identification.",
+"   -g  Display tasks with group information of each cpu hierarchically. Note",
+"       that tasks in throttled cfs_rq/rt_rq are also displayed.",
 "\nEXAMPLES",
 " Display the tasks on an O(1) scheduler run queue:\n",
 "    %s> runq",
@@ -2259,6 +2261,42 @@ char *help_runq[] = {
 "           2680986785772  PID: 28227  TASK: ffff8800787780c0  COMMAND: \"loop\"",
 "    CPU 3: 2680990954469",
 "           2680986059540  PID: 28226  TASK: ffff880078778b00  COMMAND: \"loop\"",
+" ",
+" Display tasks with group information hierarchically:\n",
+"    %s> runq -g ",
+"    CPU 0 RUNQUEUE: ffff880028216680",
+"      CURRENT: PID: 14734  TASK: ffff88010626f500  COMMAND: \"sh\"",
+"      RT PRIO_ARRAY: ffff880028216808",
+"         [  0] GROUP RT PRIO_ARRAY: ffff880139fc9800 <test1> (THROTTLED)",
+"              [  0] PID: 14750  TASK: ffff88013a4dd540  COMMAND: \"rtloop99\"",
+"              [  1] PID: 14748  TASK: ffff88013bbca040  COMMAND: \"rtloop98\"",
+"              [  1] GROUP RT PRIO_ARRAY: ffff880089029000 <test11>",
+"                    [  1] PID: 14752  TASK: ffff880088abf500  COMMAND: \"rtloop98\"",
+"              [ 54] PID: 14749  TASK: ffff880037a4e080  COMMAND: \"rtloop45\"",
+"              [ 98] PID: 14746  TASK: ffff88012678c080  COMMAND: \"rtloop1\"",
+"      CFS RB_ROOT: ffff880028216718",
+"         [120] PID: 14740  TASK: ffff88013b1e6080  COMMAND: \"sh\"",
+"         [120] PID: 14738  TASK: ffff88012678d540  COMMAND: \"sh\"",
+"         GROUP CFS RB_ROOT: ffff8800897af430 <test2> (THROTTLED)",
+"            [120] PID: 14732  TASK: ffff88013bbcb500  COMMAND: \"sh\"",
+"            [120] PID: 14728  TASK: ffff8800b3496080  COMMAND: \"sh\"",
+"            [120] PID: 14730  TASK: ffff880037833540  COMMAND: \"sh\"",
+"         GROUP CFS RB_ROOT: ffff880037943e30 <test1> (THROTTLED)",
+"            [120] PID: 14726  TASK: ffff880138d42aa0  COMMAND: \"sh\"",
+" ",
+"    CPU 1 RUNQUEUE: ffff880028296680",
+"      CURRENT: PID: 3269   TASK: ffff88013b0fa040  COMMAND: \"bash\"",
+"      RT PRIO_ARRAY: ffff880028296808",
+"         [  0] GROUP RT PRIO_ARRAY: ffff88008a1f5000 <test1> (THROTTLED)",
+"               [  0] GROUP RT PRIO_ARRAY: ffff880121774800 <test11>",
+"                     [  0] PID: 14753  TASK: ffff88013bbbaae0  COMMAND: \"rtloop99\"",
+"               [ 98] PID: 14745  TASK: ffff880126763500  COMMAND: \"rtloop1\"",
+"               [ 98] PID: 14747  TASK: ffff88013b1e6ae0  COMMAND: \"rtloop1\"",
+"      CFS RB_ROOT: ffff880028296718",
+"         GROUP CFS RB_ROOT: ffff8800896eac30 <test1>",
+"            [120] PID: 14724  TASK: ffff880139632080  COMMAND: \"sh\"",
+"         [120] PID: 14742  TASK: ffff880126762aa0  COMMAND: \"sh\"",
+"         [120] PID: 14736  TASK: ffff88010626e040  COMMAND: \"sh\"",
 NULL               
 };
 
-- 
1.7.1

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility