Re: [PATCH] runq: make tasks in throttled cfs_rqs/rt_rqs displayed

zhangyanfei <zhangyanfei@xxxxxxxxxxxxxx> · Wed, 07 Nov 2012 18:26:53 +0800

于 2012年11月07日 05:30, Dave Anderson 写道:
> 
> 
> ----- Original Message -----
> 
>>>  
>>>>
>>>> So I attach the new patch v2 version for runq -g. If you still find
>>>> any bug in your tests or have any suggestion about it, that's very
>>>> helpful.
>>>>
>>>> TODO:
>>>> 1. The help info about the -g option.
>>>> 2. Change rt_rq tasks displayed non-hierarchically.
>>>
>>> Like I mentioned above, the latest patch does not change the default
>>> behavior of runq alone, and "runq -g" is not as verbose as the last
>>> patch, which I presume is your intent.
>>
>>
>> Two patches added. One is the fixed version for v2 and the other is
>> adding the help info for run -g.
>>
>> Thanks
>> Zhang
> 
> These latest patches tested OK.
> 
> However, I must admit that I don't really understand the details w/respect
> to the difference between the "runq" and "runq -g" output other than the 
> two different options essentially find all per-cpu queued tasks coming from
> different rb_root structures.  Also, the "runq -g" implementation is remarkably
> complex in comparison to the "runq" alone, making its future maintenance a 
> potentially difficult chore because of the huge number of structure.member
> dependencies.

Hmm, runq -g includes tasks in throttled cfs_rqs/rt_rqs. Its implementation is
so complex because when dumping a cfs rb_tree, if one of its node is a also child
cfs rb_tree, we follow the child tree and dump all tasks in it. If the child cfs
rb_tree is throttled, it is dequeued from its parent rb_tree and we lost tasks in
this rb_tree. So when dumping the cfs rb_tree, I checked if there is a child cfs
rb_tree throttled, if so, dumping it.
The same thing is for rt_rq.

Dumping group info, such as group name, is to make it clear that which task is
belong to which group.

> 
> Now, for the most part, you have been able to segregate the code, *except*
> for the overloading of the currently-existing dump_tasks_in_cfs_rq() function to
> handle both the "runq" and the "runq -g" commands, i.e., where you've merged in 
> the g_flag and depth business.  And as a result, it makes the patched "dual-mode"
> dump_tasks_in_cfs_rq() function difficult to understand.
> 
> Here is my suggestion/request:
> 
> (1) please leave dump_CFS_runqueues() and dump_tasks_in_cfs_rq() as they are now.
> (2) create a new (somewhat redundant) "dump_tasks_in_task_group_rq()" function that is only
>     used for "runq -g".
> (3) have your new dump_tasks_by_task_group() function call the new dump_tasks_in_task_group_rq()
>     function.
> 
> That would completely segregate the implementations of the two options.  The worst
> case for doing that it that way is that there will be dump_tasks_in_cfs_rq() and 
> and dump_tasks_in_task_group_rq() functions that are somewhat similar.  But considering
> the huge amount of code that you are adding for "runq -g", having separate functions is
> hardly a cause for concern.  And it should be fairly trivial for you to update your patch
> to do it that way.
>  
> And then you could effectively do the same type of code separation for the RT queues.
> 
> Does that make sense to you?

ok. I rewrite the patch and they are tested ok in my box.

Thanks
Zhang
>From d9ef53a3429cb7b65be3c99d93b3685965b4eb31 Mon Sep 17 00:00:00 2001
From: Zhang Yanfei <zhangyanfei@xxxxxxxxxxxxxx>
Date: Wed, 7 Nov 2012 16:54:35 +0800
Subject: [PATCH 1/2] add -g option for runq v4

Signed-off-by: Zhang Yanfei <zhangyanfei@xxxxxxxxxxxxxx>
---
 defs.h    |   17 ++
 kernel.c  |    3 +
 symbols.c |   34 +++
 task.c    |  677 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 717 insertions(+), 14 deletions(-)

diff --git a/defs.h b/defs.h
index 319584f..798dd9b 100755
--- a/defs.h
+++ b/defs.h
@@ -1792,6 +1792,22 @@ struct offset_table {                    /* stash of commonly-used offsets */
 	long sched_rt_entity_my_q;
 	long neigh_table_hash_shift;
 	long neigh_table_nht_ptr;
+	long task_group_parent;
+	long task_group_css;
+	long cgroup_subsys_state_cgroup;
+	long cgroup_dentry;
+	long task_group_rt_rq;
+	long rt_rq_tg;
+	long task_group_cfs_rq;
+	long cfs_rq_tg;
+	long task_group_siblings;
+	long task_group_children;
+	long task_group_cfs_bandwidth;
+	long cfs_rq_throttled;
+	long task_group_rt_bandwidth;
+	long rt_rq_rt_throttled;
+	long rt_rq_highest_prio;
+	long rt_rq_rt_nr_running;
 };
 
 struct size_table {         /* stash of commonly-used sizes */
@@ -1927,6 +1943,7 @@ struct size_table {         /* stash of commonly-used sizes */
 	long log;
 	long log_level;
 	long rt_rq;
+	long task_group;
 };
 
 struct array_table {
diff --git a/kernel.c b/kernel.c
index 45da48e..76441e9 100755
--- a/kernel.c
+++ b/kernel.c
@@ -308,6 +308,9 @@ kernel_init()
 	STRUCT_SIZE_INIT(prio_array, "prio_array"); 
 
 	MEMBER_OFFSET_INIT(rq_cfs, "rq", "cfs");
+	MEMBER_OFFSET_INIT(task_group_cfs_rq, "task_group", "cfs_rq");
+	MEMBER_OFFSET_INIT(task_group_rt_rq, "task_group", "rt_rq");
+	MEMBER_OFFSET_INIT(task_group_parent, "task_group", "parent");
 
        /*
         *  In 2.4, smp_send_stop() sets smp_num_cpus back to 1
diff --git a/symbols.c b/symbols.c
index 1f09c9f..3179edc 100755
--- a/symbols.c
+++ b/symbols.c
@@ -8820,6 +8820,38 @@ dump_offset_table(char *spec, ulong makestruct)
 		OFFSET(log_flags_level));
 	fprintf(fp, "          sched_rt_entity_my_q: %ld\n",
 		OFFSET(sched_rt_entity_my_q));
+	fprintf(fp, "             task_group_parent: %ld\n",
+		OFFSET(task_group_parent));
+	fprintf(fp, "                task_group_css: %ld\n",
+		OFFSET(task_group_css));
+	fprintf(fp, "    cgroup_subsys_state_cgroup: %ld\n",
+		OFFSET(cgroup_subsys_state_cgroup));
+	fprintf(fp, "                 cgroup_dentry: %ld\n",
+		OFFSET(cgroup_dentry));
+	fprintf(fp, "              task_group_rt_rq: %ld\n",
+		OFFSET(task_group_rt_rq));
+	fprintf(fp, "                      rt_rq_tg: %ld\n",
+		OFFSET(rt_rq_tg));
+	fprintf(fp, "             task_group_cfs_rq: %ld\n",
+		OFFSET(task_group_cfs_rq));
+	fprintf(fp, "                     cfs_rq_tg: %ld\n",
+		OFFSET(cfs_rq_tg));
+	fprintf(fp, "           task_group_siblings: %ld\n",
+		OFFSET(task_group_siblings));
+	fprintf(fp, "           task_group_children: %ld\n",
+		OFFSET(task_group_children));
+	fprintf(fp, "      task_group_cfs_bandwidth: %ld\n",
+		OFFSET(task_group_cfs_bandwidth));
+	fprintf(fp, "              cfs_rq_throttled: %ld\n",
+		OFFSET(cfs_rq_throttled));
+	fprintf(fp, "       task_group_rt_bandwidth: %ld\n",
+		OFFSET(task_group_rt_bandwidth));
+	fprintf(fp, "            rt_rq_rt_throttled: %ld\n",
+		OFFSET(rt_rq_rt_throttled));
+	fprintf(fp, "            rt_rq_highest_prio: %ld\n",
+		OFFSET(rt_rq_highest_prio));
+	fprintf(fp, "           rt_rq_rt_nr_running: %ld\n",
+		OFFSET(rt_rq_rt_nr_running));
 
 	fprintf(fp, "\n                    size_table:\n");
 	fprintf(fp, "                          page: %ld\n", SIZE(page));
@@ -9037,6 +9069,8 @@ dump_offset_table(char *spec, ulong makestruct)
 		SIZE(log_level));
 	fprintf(fp, "                         rt_rq: %ld\n",
 		SIZE(rt_rq));
+	fprintf(fp, "                    task_group: %ld\n",
+		SIZE(task_group));
 
         fprintf(fp, "\n                   array_table:\n");
 	/*
diff --git a/task.c b/task.c
index f8c6325..f5b3882 100755
--- a/task.c
+++ b/task.c
@@ -64,10 +64,27 @@ static struct rb_node *rb_parent(struct rb_node *, struct rb_node *);
 static struct rb_node *rb_right(struct rb_node *, struct rb_node *);
 static struct rb_node *rb_left(struct rb_node *, struct rb_node *);
 static void dump_task_runq_entry(struct task_context *);
+static void print_group_header_fair(int, ulong, void *);
+static void print_parent_task_group_fair(void *, int);
+static int dump_tasks_in_lower_dequeued_cfs_rq(int, ulong, int);
 static int dump_tasks_in_cfs_rq(ulong);
+static int dump_tasks_in_task_group_cfs_rq(int, ulong, int);
 static void dump_on_rq_tasks(void);
+static void cfs_rq_offset_init(void);
+static void task_group_offset_init(void);
 static void dump_CFS_runqueues(void);
+static void print_group_header_rt(ulong, void *);
+static void print_parent_task_group_rt(void *, int);
+static int dump_tasks_in_lower_dequeued_rt_rq(int, ulong, int);
 static void dump_RT_prio_array(int, ulong, char *);
+static void dump_tasks_in_task_group_rt_rq(int, ulong, int);
+static void get_task_group_name(ulong, char **);
+static void sort_task_group_info_array(void);
+static void print_task_group_info_array(void);
+static void reuse_task_group_info_array(void);
+static void free_task_group_info_array(void);
+static void fill_task_group_info_array(int, ulong, char *, int);
+static void dump_tasks_by_task_group(void);
 static void task_struct_member(struct task_context *,unsigned int, struct reference *);
 static void signal_reference(struct task_context *, ulong, struct reference *);
 static void do_sig_thread_group(ulong);
@@ -7028,8 +7045,9 @@ cmd_runq(void)
         int c;
 	int sched_debug = 0;
 	int dump_timestamp_flag = 0;
+	int dump_task_group_flag = 0;
 
-        while ((c = getopt(argcnt, args, "dt")) != EOF) {
+        while ((c = getopt(argcnt, args, "dtg")) != EOF) {
                 switch(c)
                 {
 		case 'd':
@@ -7038,6 +7056,13 @@ cmd_runq(void)
 		case 't':
 			dump_timestamp_flag = 1;
 			break;
+		case 'g':
+			if (INVALID_MEMBER(task_group_cfs_rq) ||
+			    INVALID_MEMBER(task_group_rt_rq) ||
+			    INVALID_MEMBER(task_group_parent))
+				option_not_supported(c);
+			dump_task_group_flag = 1;
+			break;
                 default:
                         argerrs++;
                         break;
@@ -7053,12 +7078,16 @@ cmd_runq(void)
                 return;
         }
 
-
 	if (sched_debug) {
 		dump_on_rq_tasks();
 		return;
 	}
 
+	if (dump_task_group_flag) {
+		dump_tasks_by_task_group();
+		return;
+	}
+
 	dump_runq();
 }
 
@@ -7421,6 +7450,80 @@ rb_next(struct rb_node *node)
         return parent;
 }
 
+#define MAX_GROUP_NUM 200
+struct task_group_info {
+	int use;
+	int depth;
+	char *name;
+	ulong task_group;
+	struct task_group_info *parent;
+};
+
+static struct task_group_info **tgi_array;
+static int tgi_p = 0;
+
+static void
+sort_task_group_info_array(void)
+{
+	int i, j;
+	struct task_group_info *tmp;
+
+	for (i = 0; i < tgi_p - 1; i++) {
+		for (j = 0; j < tgi_p - i - 1; j++) {
+			if (tgi_array[j]->depth > tgi_array[j+1]->depth) {
+				tmp = tgi_array[j+1];
+				tgi_array[j+1] = tgi_array[j];
+				tgi_array[j] = tmp;
+			}
+		}
+	}
+}
+
+static void
+print_task_group_info_array(void)
+{
+	int i;
+
+	for (i = 0; i < tgi_p; i++) {
+		fprintf(fp, "%d : use=%d, depth=%d, group=%lx, ", i,
+			tgi_array[i]->use, tgi_array[i]->depth,
+			tgi_array[i]->task_group);
+		fprintf(fp, "name=%s, ",
+			tgi_array[i]->name ? tgi_array[i]->name : "NULL");
+		if (tgi_array[i]->parent)
+			fprintf(fp, "parent=%lx",
+				tgi_array[i]->parent->task_group);
+		fprintf(fp, "\n");
+	}
+}
+
+static void
+free_task_group_info_array(void)
+{
+	int i;
+
+	for (i = 0; i < tgi_p; i++) {
+		if (tgi_array[i]->name)
+			FREEBUF(tgi_array[i]->name);
+		FREEBUF(tgi_array[i]);
+	}
+	tgi_p = 0;
+	FREEBUF(tgi_array);
+}
+
+static void
+reuse_task_group_info_array(void)
+{
+	int i;
+
+	for (i = 0; i < tgi_p; i++) {
+		if (tgi_array[i]->depth == 0)
+			tgi_array[i]->use = 0;
+		else
+			tgi_array[i]->use = 1;
+	}
+}
+
 static void
 dump_task_runq_entry(struct task_context *tc)
 {
@@ -7428,11 +7531,98 @@ dump_task_runq_entry(struct task_context *tc)
 
 	readmem(tc->task + OFFSET(task_struct_prio), KVADDR, 
 		&prio, sizeof(int), "task prio", FAULT_ON_ERROR);
-	fprintf(fp, "     [%3d] ", prio);
+	fprintf(fp, "[%3d] ", prio);
 	fprintf(fp, "PID: %-5ld  TASK: %lx  COMMAND: \"%s\"\n",
 		tc->pid, tc->task, tc->comm);
 }
 
+static void
+print_group_header_fair(int depth, ulong cfs_rq, void *t)
+{
+	int throttled;
+	struct rb_root *root;
+	struct task_group_info *tgi = (struct task_group_info *)t;
+
+	root = (struct rb_root *)(cfs_rq + OFFSET(cfs_rq_tasks_timeline));
+	INDENT(2 + 3 * depth);
+	fprintf(fp, "GROUP CFS RB_ROOT: %lx", (ulong)root);
+	if (tgi->name)
+		fprintf(fp, " <%s>", tgi->name);
+
+	if (VALID_MEMBER(task_group_cfs_bandwidth)) {
+		readmem(cfs_rq + OFFSET(cfs_rq_throttled), KVADDR,
+			&throttled, sizeof(int), "cfs_rq throttled",
+			FAULT_ON_ERROR);
+		if (throttled)
+			fprintf(fp, " (THROTTLED)");
+	}
+	fprintf(fp, "\n");
+}
+
+static void
+print_parent_task_group_fair(void *t, int cpu)
+{
+	struct task_group_info *tgi;
+	ulong cfs_rq_c, cfs_rq_p;
+
+	tgi = ((struct task_group_info *)t)->parent;
+	if (tgi && tgi->use)
+		print_parent_task_group_fair(tgi, cpu);
+	else
+		return;
+
+	readmem(tgi->task_group + OFFSET(task_group_cfs_rq),
+		KVADDR, &cfs_rq_c, sizeof(ulong),
+		"task_group cfs_rq", FAULT_ON_ERROR);
+	readmem(cfs_rq_c + cpu * sizeof(ulong), KVADDR, &cfs_rq_p,
+		sizeof(ulong), "task_group cfs_rq", FAULT_ON_ERROR);
+
+	print_group_header_fair(tgi->depth, cfs_rq_p, tgi);
+	tgi->use = 0;
+}
+
+static int
+dump_tasks_in_lower_dequeued_cfs_rq(int depth, ulong cfs_rq, int cpu)
+{
+	int i, total, nr_running;
+	ulong group, cfs_rq_c, cfs_rq_p;
+
+	total = 0;
+	for (i = 0; i < tgi_p; i++) {
+		if (tgi_array[i]->use == 0 || tgi_array[i]->depth - depth != 1)
+			continue;
+
+		readmem(cfs_rq + OFFSET(cfs_rq_tg), KVADDR, &group,
+			sizeof(ulong), "cfs_rq tg", FAULT_ON_ERROR);
+		if (group != tgi_array[i]->parent->task_group)
+			continue;
+
+		readmem(tgi_array[i]->task_group + OFFSET(task_group_cfs_rq),
+			KVADDR, &cfs_rq_c, sizeof(ulong), "task_group cfs_rq",
+			FAULT_ON_ERROR);
+		readmem(cfs_rq_c + cpu * sizeof(ulong), KVADDR, &cfs_rq_p,
+			sizeof(ulong), "task_group cfs_rq", FAULT_ON_ERROR);
+		if (cfs_rq == cfs_rq_p)
+			continue;
+
+		readmem(cfs_rq_p + OFFSET(cfs_rq_nr_running), KVADDR,
+			&nr_running, sizeof(int), "cfs_rq nr_running",
+			FAULT_ON_ERROR);
+		if (nr_running == 0) {
+			total += dump_tasks_in_lower_dequeued_cfs_rq(depth + 1,
+				cfs_rq_p, cpu);
+			continue;
+		}
+
+		print_parent_task_group_fair(tgi_array[i], cpu);
+
+		total++;
+		total += dump_tasks_in_task_group_cfs_rq(depth + 1, cfs_rq_p, cpu);
+	}
+
+	return total;
+}
+
 static int
 dump_tasks_in_cfs_rq(ulong cfs_rq)
 {
@@ -7475,9 +7665,10 @@ dump_tasks_in_cfs_rq(ulong cfs_rq)
 				     OFFSET(sched_entity_run_node));
 		if (!tc)
 			continue;
-		if (hq_enter((ulong)tc))
+		if (hq_enter((ulong)tc)) {
+			INDENT(5);
 			dump_task_runq_entry(tc);
-		else {
+		} else {
 			error(WARNING, "duplicate CFS runqueue node: task %lx\n",
 				tc->task);
 			return total;
@@ -7488,6 +7679,87 @@ dump_tasks_in_cfs_rq(ulong cfs_rq)
 	return total;
 }
 
+static int
+dump_tasks_in_task_group_cfs_rq(int depth, ulong cfs_rq, int cpu)
+{
+	struct task_context *tc;
+	struct rb_root *root;
+	struct rb_node *node;
+	ulong my_q, leftmost, curr, curr_my_q, tg;
+	int total, i;
+
+	total = 0;
+
+	if (depth) {
+		readmem(cfs_rq + OFFSET(cfs_rq_tg), KVADDR,
+			&tg, sizeof(ulong), "cfs_rq tg",
+			FAULT_ON_ERROR);
+		for (i = 0; i < tgi_p; i++) {
+			if (tgi_array[i]->task_group == tg) {
+				print_group_header_fair(depth,
+					cfs_rq, tgi_array[i]);
+				tgi_array[i]->use = 0;
+				break;
+			}
+		}
+	}
+
+	if (VALID_MEMBER(sched_entity_my_q)) {
+		readmem(cfs_rq + OFFSET(cfs_rq_curr), KVADDR, &curr,
+			sizeof(ulong), "curr", FAULT_ON_ERROR);
+		if (curr) {
+			readmem(curr + OFFSET(sched_entity_my_q), KVADDR,
+				&curr_my_q, sizeof(ulong), "curr->my_q",
+				FAULT_ON_ERROR);
+			if (curr_my_q) {
+				total++;
+				total += dump_tasks_in_task_group_cfs_rq(depth + 1,
+					curr_my_q, cpu);
+			}
+		}
+	}
+
+	readmem(cfs_rq + OFFSET(cfs_rq_rb_leftmost), KVADDR, &leftmost,
+		sizeof(ulong), "rb_leftmost", FAULT_ON_ERROR);
+	root = (struct rb_root *)(cfs_rq + OFFSET(cfs_rq_tasks_timeline));
+
+	for (node = rb_first(root); leftmost && node; node = rb_next(node)) {
+		if (VALID_MEMBER(sched_entity_my_q)) {
+			readmem((ulong)node - OFFSET(sched_entity_run_node)
+				+ OFFSET(sched_entity_my_q), KVADDR, &my_q,
+				sizeof(ulong), "my_q", FAULT_ON_ERROR);
+			if (my_q) {
+				total++;
+				total += dump_tasks_in_task_group_cfs_rq(depth + 1,
+					my_q, cpu);
+				continue;
+			}
+		}
+
+		tc = task_to_context((ulong)node - OFFSET(task_struct_se) -
+				     OFFSET(sched_entity_run_node));
+		if (!tc)
+			continue;
+		if (hq_enter((ulong)tc)) {
+			INDENT(5 + 3 * depth);
+			dump_task_runq_entry(tc);
+		} else {
+			error(WARNING, "duplicate CFS runqueue node: task %lx\n",
+				tc->task);
+			return total;
+		}
+		total++;
+	}
+
+	total += dump_tasks_in_lower_dequeued_cfs_rq(depth, cfs_rq, cpu);
+
+	if (!total) {
+		INDENT(5 + 3 * depth);
+		fprintf(fp, "[no tasks queued]\n");
+	}
+	return total;
+}
+
 static void
 dump_on_rq_tasks(void)
 {
@@ -7531,6 +7803,7 @@ dump_on_rq_tasks(void)
 			if (!on_rq || tc->processor != cpu)
 				continue;
 
+			INDENT(5);
 			dump_task_runq_entry(tc);
 			tot++;
 		}
@@ -7543,16 +7816,8 @@ dump_on_rq_tasks(void)
 }
 
 static void
-dump_CFS_runqueues(void)
+cfs_rq_offset_init(void)
 {
-	int tot, cpu;
-	ulong runq, cfs_rq;
-	char *runqbuf, *cfs_rq_buf;
-	ulong tasks_timeline ATTRIBUTE_UNUSED;
-	struct task_context *tc;
-	struct rb_root *root;
-	struct syment *rq_sp, *init_sp;
-
 	if (!VALID_STRUCT(cfs_rq)) {
 		STRUCT_SIZE_INIT(cfs_rq, "cfs_rq");
 		STRUCT_SIZE_INIT(rt_rq, "rt_rq");
@@ -7585,6 +7850,49 @@ dump_CFS_runqueues(void)
 			"run_list");
 		MEMBER_OFFSET_INIT(rt_prio_array_queue, "rt_prio_array", "queue");
 	}
+}
+
+static void
+task_group_offset_init(void)
+{
+	if (!VALID_STRUCT(task_group)) {
+		STRUCT_SIZE_INIT(task_group, "task_group");
+		MEMBER_OFFSET_INIT(rt_rq_rt_nr_running, "rt_rq", "rt_nr_running");
+		MEMBER_OFFSET_INIT(cfs_rq_tg, "cfs_rq", "tg");
+		MEMBER_OFFSET_INIT(rt_rq_tg, "rt_rq", "tg");
+		MEMBER_OFFSET_INIT(rt_rq_highest_prio, "rt_rq", "highest_prio");
+		MEMBER_OFFSET_INIT(task_group_css, "task_group", "css");
+		MEMBER_OFFSET_INIT(cgroup_subsys_state_cgroup,
+			"cgroup_subsys_state", "cgroup");
+		MEMBER_OFFSET_INIT(cgroup_dentry, "cgroup", "dentry");
+
+		MEMBER_OFFSET_INIT(task_group_siblings, "task_group", "siblings");
+		MEMBER_OFFSET_INIT(task_group_children, "task_group", "children");
+
+		MEMBER_OFFSET_INIT(task_group_cfs_bandwidth,
+			"task_group", "cfs_bandwidth");
+		MEMBER_OFFSET_INIT(cfs_rq_throttled, "cfs_rq",
+			"throttled");
+
+		MEMBER_OFFSET_INIT(task_group_rt_bandwidth,
+			"task_group", "rt_bandwidth");
+		MEMBER_OFFSET_INIT(rt_rq_rt_throttled, "rt_rq",
+			"rt_throttled");
+	}
+}
+
+static void
+dump_CFS_runqueues(void)
+{
+	int cpu, tot;
+	ulong runq, cfs_rq;
+	char *runqbuf, *cfs_rq_buf;
+	ulong tasks_timeline ATTRIBUTE_UNUSED;
+	struct task_context *tc;
+	struct rb_root *root;
+	struct syment *rq_sp, *init_sp;
+
+	cfs_rq_offset_init();
 
 	if (!(rq_sp = per_cpu_symbol_search("per_cpu__runqueues")))
 		error(FATAL, "per-cpu runqueues do not exist\n");
@@ -7643,6 +7951,7 @@ dump_CFS_runqueues(void)
 		hq_open();
 		tot = dump_tasks_in_cfs_rq(cfs_rq);
 		hq_close();
+
 		if (!tot) {
 			INDENT(5);
 			fprintf(fp, "[no tasks queued]\n");
@@ -7655,6 +7964,106 @@ dump_CFS_runqueues(void)
 }
 
 static void
+print_group_header_rt(ulong rt_rq, void *t)
+{
+	int throttled;
+	struct task_group_info *tgi = (struct task_group_info *)t;
+
+	fprintf(fp, "GROUP RT PRIO_ARRAY: %lx", rt_rq + OFFSET(rt_rq_active));
+	if (tgi->name)
+		fprintf(fp, " <%s>", tgi->name);
+
+	if (VALID_MEMBER(task_group_rt_bandwidth)) {
+		readmem(rt_rq + OFFSET(rt_rq_rt_throttled), KVADDR,
+			&throttled, sizeof(int), "rt_rq rt_throttled",
+			FAULT_ON_ERROR);
+		if (throttled)
+			fprintf(fp, " (THROTTLED)");
+	}
+	fprintf(fp, "\n");
+}
+
+static void
+print_parent_task_group_rt(void *t, int cpu)
+{
+	int prio;
+	struct task_group_info *tgi;
+	ulong rt_rq_c, rt_rq_p;
+
+
+	tgi = ((struct task_group_info *)t)->parent;
+	if (tgi && tgi->use)
+		print_parent_task_group_fair(tgi, cpu);
+	else
+		return;
+
+	readmem(tgi->task_group + OFFSET(task_group_rt_rq),
+		KVADDR, &rt_rq_c, sizeof(ulong),
+		"task_group rt_rq", FAULT_ON_ERROR);
+	readmem(rt_rq_c + cpu * sizeof(ulong), KVADDR, &rt_rq_p,
+		sizeof(ulong), "task_group rt_rq", FAULT_ON_ERROR);
+
+	readmem(rt_rq_p + OFFSET(rt_rq_highest_prio), KVADDR, &prio,
+		sizeof(int), "rt_rq highest prio", FAULT_ON_ERROR);
+
+	INDENT(-1 + 6 * tgi->depth);
+	fprintf(fp, "[%3d] ", prio);
+	print_group_header_rt(rt_rq_p, tgi);
+	tgi->use = 0;
+}
+
+static int
+dump_tasks_in_lower_dequeued_rt_rq(int depth, ulong rt_rq, int cpu)
+{
+	int i, prio, tot, delta, nr_running;
+	ulong rt_rq_c, rt_rq_p, group;
+
+	tot = 0;
+	for (i = 0; i < tgi_p; i++) {
+		delta = tgi_array[i]->depth - depth;
+		if (delta > 1)
+			break;
+
+		if (tgi_array[i]->use == 0 || delta < 1)
+			continue;
+
+		readmem(rt_rq + OFFSET(rt_rq_tg), KVADDR, &group,
+			sizeof(ulong), "rt_rq tg", FAULT_ON_ERROR);
+		if (group != tgi_array[i]->parent->task_group)
+			continue;
+
+		readmem(tgi_array[i]->task_group + OFFSET(task_group_rt_rq),
+			KVADDR, &rt_rq_c, sizeof(ulong), "task_group rt_rq",
+			FAULT_ON_ERROR);
+		readmem(rt_rq_c + cpu * sizeof(ulong), KVADDR, &rt_rq_p,
+			sizeof(ulong), "task_group rt_rq", FAULT_ON_ERROR);
+		if (rt_rq == rt_rq_p)
+			continue;
+
+		readmem(rt_rq_p + OFFSET(rt_rq_rt_nr_running), KVADDR,
+			&nr_running, sizeof(int), "rt_rq rt_nr_running",
+			FAULT_ON_ERROR);
+		if (nr_running == 0) {
+			tot += dump_tasks_in_lower_dequeued_rt_rq(depth + 1,
+				rt_rq_p, cpu);
+			continue;
+		}
+
+		print_parent_task_group_rt(tgi_array[i], cpu);
+
+		readmem(rt_rq_p + OFFSET(rt_rq_highest_prio), KVADDR,
+			&prio, sizeof(int), "rt_rq highest_prio",
+			FAULT_ON_ERROR);
+		INDENT(5 + 6 * depth);
+		fprintf(fp, "[%3d] ", prio);
+		tot++;
+		dump_tasks_in_task_group_rt_rq(depth + 1, rt_rq_p, cpu);
+	}
+
+	return tot;
+}
+
+static void
 dump_RT_prio_array(int depth, ulong k_prio_array, char *u_prio_array)
 {
 	int i, c, tot, cnt, qheads;
@@ -7742,6 +8151,246 @@ dump_RT_prio_array(int depth, ulong k_prio_array, char *u_prio_array)
 	}
 }
 
+static void
+dump_tasks_in_task_group_rt_rq(int depth, ulong rt_rq, int cpu)
+{
+	int i, c, tot, cnt, qheads;
+	ulong offset, kvaddr, uvaddr;
+	ulong list_head[2];
+        struct list_data list_data, *ld;
+	struct task_context *tc;
+	ulong *tlist;
+	ulong my_q, task_addr, tg, k_prio_array;
+	char *rt_rq_buf, *u_prio_array;
+
+	k_prio_array = rt_rq +  OFFSET(rt_rq_active);
+	rt_rq_buf = GETBUF(SIZE(rt_rq));
+	readmem(rt_rq, KVADDR, rt_rq_buf, SIZE(rt_rq), "rt_rq", FAULT_ON_ERROR);
+	u_prio_array = &rt_rq_buf[OFFSET(rt_rq_active)];
+
+	if (depth) {
+		readmem(rt_rq + OFFSET(rt_rq_tg), KVADDR,
+			&tg, sizeof(ulong), "rt_rq tg",
+			FAULT_ON_ERROR);
+		for (i = 0; i < tgi_p; i++) {
+			if (tgi_array[i]->task_group == tg) {
+				print_group_header_rt(rt_rq, tgi_array[i]);
+				tgi_array[i]->use = 0;
+				break;
+			}
+		}
+	}
+
+        qheads = (i = ARRAY_LENGTH(rt_prio_array_queue)) ?
+                i : get_array_length("rt_prio_array.queue", NULL, SIZE(list_head));
+
+	ld = &list_data;
+
+	for (i = tot = 0; i < qheads; i++) {
+		offset =  OFFSET(rt_prio_array_queue) + (i * SIZE(list_head));
+		kvaddr = k_prio_array + offset;
+		uvaddr = (ulong)u_prio_array + offset;
+		BCOPY((char *)uvaddr, (char *)&list_head[0], sizeof(ulong)*2);
+
+		if (CRASHDEBUG(1))
+			fprintf(fp, "rt_prio_array[%d] @ %lx => %lx/%lx\n",
+				i, kvaddr, list_head[0], list_head[1]);
+
+		if ((list_head[0] == kvaddr) && (list_head[1] == kvaddr))
+			continue;
+
+		BZERO(ld, sizeof(struct list_data));
+		ld->start = list_head[0];
+		if (VALID_MEMBER(task_struct_rt) &&
+		    VALID_MEMBER(sched_rt_entity_run_list))
+			ld->list_head_offset = OFFSET(sched_rt_entity_run_list);
+		else
+			ld->list_head_offset = OFFSET(task_struct_run_list);
+		ld->end = kvaddr;
+		hq_open();
+		cnt = do_list(ld);
+		hq_close();
+		tlist = (ulong *)GETBUF((cnt) * sizeof(ulong));
+		cnt = retrieve_list(tlist, cnt);
+		for (c = 0; c < cnt; c++) {
+			task_addr = tlist[c];
+			if (INVALID_MEMBER(sched_rt_entity_my_q))
+				goto is_task;
+
+			readmem(tlist[c] + OFFSET(sched_rt_entity_my_q),
+				KVADDR, &my_q, sizeof(ulong), "my_q",
+				FAULT_ON_ERROR);
+			if (!my_q) {
+				task_addr -= OFFSET(task_struct_rt);
+				goto is_task;
+			}
+
+			INDENT(5 + 6 * depth);
+			fprintf(fp, "[%3d] ", i);
+			tot++;
+			dump_tasks_in_task_group_rt_rq(depth + 1, my_q, cpu);
+			continue;
+
+is_task:
+			if (!(tc = task_to_context(task_addr)))
+				continue;
+
+			INDENT(5 + 6 * depth);
+			fprintf(fp, "[%3d] ", i);
+			fprintf(fp, "PID: %-5ld  TASK: %lx  COMMAND: \"%s\"\n",
+				tc->pid, tc->task, tc->comm);
+			tot++;
+		}
+		FREEBUF(tlist);
+	}
+
+	tot += dump_tasks_in_lower_dequeued_rt_rq(depth, rt_rq, cpu);
+
+	if (!tot) {
+		INDENT(5 + 6 * depth);
+		fprintf(fp, "[no tasks queued]\n");
+	}
+	FREEBUF(rt_rq_buf);
+}
+
+static void
+get_task_group_name(ulong group, char **group_name)
+{
+	ulong cgroup, dentry, name;
+	char *dentry_buf, *tmp;
+	int len;
+
+	readmem(group + OFFSET(task_group_css) + OFFSET(cgroup_subsys_state_cgroup),
+		KVADDR, &cgroup, sizeof(ulong),
+		"task_group css cgroup", FAULT_ON_ERROR);
+	if (cgroup == 0)
+		return;
+
+	readmem(cgroup + OFFSET(cgroup_dentry), KVADDR, &dentry, sizeof(ulong),
+		"cgroup dentry", FAULT_ON_ERROR);
+	if (dentry == 0)
+		return;
+
+	dentry_buf = GETBUF(SIZE(dentry));
+	readmem(dentry, KVADDR, dentry_buf, SIZE(dentry),
+		"dentry", FAULT_ON_ERROR);
+	len = UINT(dentry_buf + OFFSET(dentry_d_name) + OFFSET(qstr_len));
+	tmp = GETBUF(len + 1);
+	name = ULONG(dentry_buf + OFFSET(dentry_d_name) + OFFSET(qstr_name));
+	BZERO(group_name, len + 1);
+	readmem(name, KVADDR, tmp, len, "qstr name", FAULT_ON_ERROR);
+	*group_name = tmp;
+	FREEBUF(dentry_buf);
+}
+
+static void
+fill_task_group_info_array(int depth, ulong group, char *group_buf, int i)
+{
+	int d;
+	ulong kvaddr, uvaddr, offset;
+	ulong list_head[2], next;
+
+	d = tgi_p;
+	tgi_array[tgi_p] = (struct task_group_info *)
+		GETBUF(sizeof(struct task_group_info));
+	if (depth)
+		tgi_array[tgi_p]->use = 1;
+	else
+		tgi_array[tgi_p]->use = 0;
+
+	tgi_array[tgi_p]->depth = depth;
+	get_task_group_name(group, &tgi_array[tgi_p]->name);
+	tgi_array[tgi_p]->task_group = group;
+	if (i >= 0)
+		tgi_array[tgi_p]->parent = tgi_array[i];
+	else
+		tgi_array[tgi_p]->parent = NULL;
+	tgi_p++;
+
+	offset = OFFSET(task_group_children);
+	kvaddr = group + offset;
+	uvaddr = (ulong)(group_buf + offset);
+	BCOPY((char *)uvaddr, (char *)&list_head[0], sizeof(ulong)*2);
+
+	if ((list_head[0] == kvaddr) && (list_head[1] == kvaddr))
+		return;
+
+	next = list_head[0];
+	while (next != kvaddr) {
+		group = next - OFFSET(task_group_siblings);
+		readmem(group, KVADDR, group_buf, SIZE(task_group),
+			"task_group", FAULT_ON_ERROR);
+		next = ULONG(group_buf + OFFSET(task_group_siblings) +
+			OFFSET(list_head_next));
+		fill_task_group_info_array(depth + 1, group, group_buf, d);
+	}
+}
+
+static void
+dump_tasks_by_task_group(void)
+{
+	int cpu;
+	ulong root_task_group, cfs_rq, cfs_rq_p;
+	ulong rt_rq, rt_rq_p;
+	char *buf;
+	struct rb_root *root;
+	struct task_context *tc;
+
+	cfs_rq_offset_init();
+	task_group_offset_init();
+
+	root_task_group = 0;
+	if (symbol_exists("init_task_group"))
+		root_task_group = symbol_value("init_task_group");
+	else if (symbol_exists("root_task_group"))
+		root_task_group = symbol_value("root_task_group");
+	else
+		error(FATAL, "cannot determine root task_group\n");
+
+	tgi_array = (struct task_group_info **)GETBUF(sizeof(void *)
+		* MAX_GROUP_NUM);
+	buf = GETBUF(SIZE(task_group));
+	readmem(root_task_group, KVADDR, buf, SIZE(task_group),
+		"task_group", FAULT_ON_ERROR);
+	fill_task_group_info_array(0, root_task_group, buf, -1);
+	sort_task_group_info_array();
+	if (CRASHDEBUG(1))
+		print_task_group_info_array();
+
+	get_active_set();
+
+	for (cpu = 0; cpu < kt->cpus; cpu++) {
+		fprintf(fp, "%sCPU %d\n", cpu ? "\n" : "", cpu);
+		fprintf(fp, "  CURRENT: ");
+		if ((tc = task_to_context(tt->active_set[cpu])))
+			fprintf(fp, "PID: %-5ld  TASK: %lx  COMMAND: \"%s\"\n",
+				tc->pid, tc->task, tc->comm);
+		else
+			fprintf(fp, "%lx\n", tt->active_set[cpu]);
+
+		readmem(root_task_group, KVADDR, buf, SIZE(task_group),
+			"task_group", FAULT_ON_ERROR);
+		rt_rq = ULONG(buf + OFFSET(task_group_rt_rq));
+		readmem(rt_rq + cpu * sizeof(ulong), KVADDR, &rt_rq_p,
+			sizeof(ulong), "task_group rt_rq", FAULT_ON_ERROR);
+		fprintf(fp, "  RT PRIO_ARRAY: %lx\n",
+			rt_rq_p + OFFSET(rt_rq_active));
+		dump_tasks_in_task_group_rt_rq(0, rt_rq_p, cpu);
+		reuse_task_group_info_array();
+
+		cfs_rq = ULONG(buf + OFFSET(task_group_cfs_rq));
+		readmem(cfs_rq + cpu * sizeof(ulong), KVADDR, &cfs_rq_p,
+			sizeof(ulong), "task_group cfs_rq", FAULT_ON_ERROR);
+		root = (struct rb_root *)(cfs_rq + OFFSET(cfs_rq_tasks_timeline));
+		fprintf(fp, "  CFS RB_ROOT: %lx\n", (ulong)root);
+		dump_tasks_in_task_group_cfs_rq(0, cfs_rq_p, cpu);
+		reuse_task_group_info_array();
+	}
+
+	FREEBUF(buf);
+	free_task_group_info_array();
+}
+
 #undef _NSIG
 #define _NSIG           64
 #define _NSIG_BPW       machdep->bits
-- 
1.7.1

>From bc754c846d850bbb02c92240ccd1365dcecc966b Mon Sep 17 00:00:00 2001
From: Zhang Yanfei <zhangyanfei@xxxxxxxxxxxxxx>
Date: Tue, 6 Nov 2012 16:46:36 +0800
Subject: [PATCH 2/2] add help info for runq -g

Signed-off-by: Zhang Yanfei <zhangyanfei@xxxxxxxxxxxxxx>
---
 help.c |   42 ++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/help.c b/help.c
index 14bf533..9a63d04 100755
--- a/help.c
+++ b/help.c
@@ -2201,7 +2201,7 @@ NULL
 char *help_runq[] = {
 "runq",
 "run queue",
-"[-t]",
+"[-t] [-g]",
 "  With no argument, this command displays the tasks on the run queues",
 "  of each cpu.",
 " ",
@@ -2209,7 +2209,9 @@ char *help_runq[] = {
 "       rq.clock, rq.most_recent_timestamp or rq.timestamp_last_tick value,",
 "       whichever applies; following each cpu timestamp is the last_run or ",
 "       timestamp value of the active task on that cpu, whichever applies, ",
-"       along with the task identification.", 
+"       along with the task identification.",
+"   -g  Display tasks with group information of each cpu hierarchically. Note",
+"       that tasks in throttled cfs_rq/rt_rq are also displayed.",
 "\nEXAMPLES",
 " Display the tasks on an O(1) scheduler run queue:\n",
 "    %s> runq",
@@ -2259,6 +2261,42 @@ char *help_runq[] = {
 "           2680986785772  PID: 28227  TASK: ffff8800787780c0  COMMAND: \"loop\"",
 "    CPU 3: 2680990954469",
 "           2680986059540  PID: 28226  TASK: ffff880078778b00  COMMAND: \"loop\"",
+" ",
+" Display tasks with group information hierarchically:\n",
+"    %s> runq -g ",
+"    CPU 0",
+"      CURRENT: PID: 14734  TASK: ffff88010626f500  COMMAND: \"sh\"",
+"      RT PRIO_ARRAY: ffff880028216808",
+"         [  0] GROUP RT PRIO_ARRAY: ffff880139fc9800 <test1> (THROTTLED)",
+"              [  0] PID: 14750  TASK: ffff88013a4dd540  COMMAND: \"rtloop99\"",
+"              [  1] PID: 14748  TASK: ffff88013bbca040  COMMAND: \"rtloop98\"",
+"              [  1] GROUP RT PRIO_ARRAY: ffff880089029000 <test11>",
+"                    [  1] PID: 14752  TASK: ffff880088abf500  COMMAND: \"rtloop98\"",
+"              [ 54] PID: 14749  TASK: ffff880037a4e080  COMMAND: \"rtloop45\"",
+"              [ 98] PID: 14746  TASK: ffff88012678c080  COMMAND: \"rtloop1\"",
+"      CFS RB_ROOT: ffff88013fc23050",
+"         [120] PID: 14740  TASK: ffff88013b1e6080  COMMAND: \"sh\"",
+"         [120] PID: 14738  TASK: ffff88012678d540  COMMAND: \"sh\"",
+"         GROUP CFS RB_ROOT: ffff8800897af430 <test2> (THROTTLED)",
+"            [120] PID: 14732  TASK: ffff88013bbcb500  COMMAND: \"sh\"",
+"            [120] PID: 14728  TASK: ffff8800b3496080  COMMAND: \"sh\"",
+"            [120] PID: 14730  TASK: ffff880037833540  COMMAND: \"sh\"",
+"         GROUP CFS RB_ROOT: ffff880037943e30 <test1> (THROTTLED)",
+"            [120] PID: 14726  TASK: ffff880138d42aa0  COMMAND: \"sh\"",
+" ",
+"    CPU 1",
+"      CURRENT: PID: 3269   TASK: ffff88013b0fa040  COMMAND: \"bash\"",
+"      RT PRIO_ARRAY: ffff880028296808",
+"         [  0] GROUP RT PRIO_ARRAY: ffff88008a1f5000 <test1> (THROTTLED)",
+"               [  0] GROUP RT PRIO_ARRAY: ffff880121774800 <test11>",
+"                     [  0] PID: 14753  TASK: ffff88013bbbaae0  COMMAND: \"rtloop99\"",
+"               [ 98] PID: 14745  TASK: ffff880126763500  COMMAND: \"rtloop1\"",
+"               [ 98] PID: 14747  TASK: ffff88013b1e6ae0  COMMAND: \"rtloop1\"",
+"      CFS RB_ROOT: ffff88013fc23050",
+"         GROUP CFS RB_ROOT: ffff8800896eac30 <test1>",
+"            [120] PID: 14724  TASK: ffff880139632080  COMMAND: \"sh\"",
+"         [120] PID: 14742  TASK: ffff880126762aa0  COMMAND: \"sh\"",
+"         [120] PID: 14736  TASK: ffff88010626e040  COMMAND: \"sh\"",
 NULL               
 };
 
-- 
1.7.1

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility