Re: [PATCH] Fix bugs in runq

Zhang Yanfei <zhangyanfei@xxxxxxxxxxxxxx> · Sat, 25 Aug 2012 11:23:32 +0800

于 2012年08月25日 02:17, Dave Anderson 写道:
> 
> 
> ----- Original Message -----
>>
>>
>> ----- Original Message -----
>>> Hello Dave,
>>>
>>> In runq command, when dumping cfs and rt runqueues,
>>> it seems that we get the wrong nr_running values of rq
>>> and cfs_rq.
>>>
>>> Please refer to the attached patch.
>>>
>>> Thanks
>>> Zhang Yanfei
>>
>> Hello Zhang,
>>
>> I understand what you are trying to accomplish with this patch, but
>> none of my test dumpfiles can actually verify it because there is no
>> difference with or without your patch.  What failure mode did you see
>> in your testing?  I presume that it just showed "[no tasks queued]"
>> for the RT runqueue when there were actually tasks queued there?
>>
>> The reason I ask is that I'm thinking that a better solution would
>> be to simplify dump_CFS_runqueues() by *not* accessing and using
>> rq_nr_running, cfs_rq_nr_running or cfs_rq_h_nr_running.
>>
>> Those counters are only read to determine the "active" argument to
>> pass to dump_RT_prio_array(), which returns immediately if it is
>> FALSE.  However, if we get rid of the "active" argument and simply
>> allow dump_RT_prio_array() to always check its queues every time,
>> it still works just fine.
>>
>> For example, I tested my set of sample dumpfiles with this patch:
>>  
>>  diff -u -r1.205 task.c
>>  --- task.c      12 Jul 2012 20:04:00 -0000      1.205
>>  +++ task.c      22 Aug 2012 15:33:32 -0000
>>  @@ -7636,7 +7636,7 @@
>>                                  OFFSET(cfs_rq_tasks_timeline));
>>                  }
>>   
>>  -               dump_RT_prio_array(nr_running != cfs_rq_nr_running,
>>  +               dump_RT_prio_array(TRUE,
>>                          runq + OFFSET(rq_rt) + OFFSET(rt_rq_active),
>>                          &runqbuf[OFFSET(rq_rt) +
>>                          OFFSET(rt_rq_active)]);
>>   
>> and the output is identical to testing with, and without, your patch.
>>
>> So the question is whether dump_CFS_runqueues() should be needlessly
>> complicated with all of the "nr_running" references?
>>
>> In fact, it also seems possible that a crash could happen at a point in
>> the scheduler code where those counters are not
>> valid/current/trustworthy.
>>
>> So unless you can convince me otherwise, I'd prefer to just remove
>> the "nr_running" business completely.
> 
> Hello Zhang,
> 
> Here's the patch I've got queued, which resolves the bug you encountered
> by simplifying things:
> 

OK. I see.

And based on this patch, I made a new patch to solve the problem when
dumping rt runqueues. Currently dump_RT_prio_array() doesn't support
rt group scheduler.

In my test, I put some rt tasks into one group, just like below:

mkdir /cgroup/cpu/test1
echo 850000 > /cgroup/cpu/test1/cpu.rt_runtime_us

./rtloop1 &
echo $! > /cgroup/cpu/test1/tasks
./rtloop1 &
echo $! > /cgroup/cpu/test1/tasks
./rtloop1 &
echo $! > /cgroup/cpu/test1/tasks
./rtloop98 &
echo $! > /cgroup/cpu/test1/tasks
./rtloop45 &
echo $! > /cgroup/cpu/test1/tasks
./rtloop99 &
echo $! > /cgroup/cpu/test1/tasks

Using crash to analyse the vmcore:

crash> runq
CPU 0 RUNQUEUE: ffff880028216680
  CURRENT: PID: 5125   TASK: ffff88010799d540  COMMAND: "sh"
  RT PRIO_ARRAY: ffff880028216808
     [  0] PID: 5136   TASK: ffff8801153cc040  COMMAND: "rtloop99"
           PID: 6      TASK: ffff88013d7c6080  COMMAND: "watchdog/0"
           PID: 3      TASK: ffff88013d7ba040  COMMAND: "migration/0"
     [  1] PID: 5134   TASK: ffff8801153cd500  COMMAND: "rtloop98"
           PID: 5135   TASK: ffff8801153ccaa0  COMMAND: "rtloop98"
  CFS RB_ROOT: ffff880028216718
     [120] PID: 5109   TASK: ffff880037923500  COMMAND: "sh"
     [120] PID: 5107   TASK: ffff88006eeccaa0  COMMAND: "sh"
     [120] PID: 5123   TASK: ffff880107a4caa0  COMMAND: "sh"

CPU 1 RUNQUEUE: ffff880028296680
  CURRENT: PID: 5086   TASK: ffff88006eecc040  COMMAND: "bash"
  RT PRIO_ARRAY: ffff880028296808
     [  0] PID: 5137   TASK: ffff880107b35540  COMMAND: "rtloop99"
           PID: 10     TASK: ffff88013cc2cae0  COMMAND: "watchdog/1"
           PID: 2852   TASK: ffff88013bd5aae0  COMMAND: "rtkit-daemon"
     [ 54]   CFS RB_ROOT: ffff880028296718
     [120] PID: 5115   TASK: ffff8801152b1500  COMMAND: "sh"
     [120] PID: 5113   TASK: ffff880139530080  COMMAND: "sh"
     [120] PID: 5111   TASK: ffff88011bd86080  COMMAND: "sh"
     [120] PID: 5121   TASK: ffff880115a9e080  COMMAND: "sh"
     [120] PID: 5117   TASK: ffff8801152b0040  COMMAND: "sh"
     [120] PID: 5119   TASK: ffff880115a9eae0  COMMAND: "sh"

We can see that the output is kind of incorrect.

After applying the attached patch, crash seems to work well:

crash> runq
CPU 0 RUNQUEUE: ffff880028216680
  CURRENT: PID: 5125   TASK: ffff88010799d540  COMMAND: "sh"
  RT PRIO_ARRAY: ffff880028216808
     [  0] PID: 5136   TASK: ffff8801153cc040  COMMAND: "rtloop99"
           CHILD RT PRIO_ARRAY: ffff88013b050000
              [  0] PID: 5133   TASK: ffff88010799c080  COMMAND: "rtloop99"
              [  1] PID: 5131   TASK: ffff880037922aa0  COMMAND: "rtloop98"
              [ 98] PID: 5128   TASK: ffff88011bd87540  COMMAND: "rtloop1"
                    PID: 5130   TASK: ffff8801396e7500  COMMAND: "rtloop1"
                    PID: 5129   TASK: ffff88011bf5a080  COMMAND: "rtloop1"
           PID: 6      TASK: ffff88013d7c6080  COMMAND: "watchdog/0"
           PID: 3      TASK: ffff88013d7ba040  COMMAND: "migration/0"
     [  1] PID: 5134   TASK: ffff8801153cd500  COMMAND: "rtloop98"
           PID: 5135   TASK: ffff8801153ccaa0  COMMAND: "rtloop98"
  CFS RB_ROOT: ffff880028216718
     [120] PID: 5109   TASK: ffff880037923500  COMMAND: "sh"
     [120] PID: 5107   TASK: ffff88006eeccaa0  COMMAND: "sh"
     [120] PID: 5123   TASK: ffff880107a4caa0  COMMAND: "sh"

CPU 1 RUNQUEUE: ffff880028296680
  CURRENT: PID: 5086   TASK: ffff88006eecc040  COMMAND: "bash"
  RT PRIO_ARRAY: ffff880028296808
     [  0] PID: 5137   TASK: ffff880107b35540  COMMAND: "rtloop99"
           PID: 10     TASK: ffff88013cc2cae0  COMMAND: "watchdog/1"
           PID: 2852   TASK: ffff88013bd5aae0  COMMAND: "rtkit-daemon"
     [ 54] CHILD RT PRIO_ARRAY: ffff880138978000
              [ 54] PID: 5132   TASK: ffff88006eecd500  COMMAND: "rtloop45"
  CFS RB_ROOT: ffff880028296718
     [120] PID: 5115   TASK: ffff8801152b1500  COMMAND: "sh"
     [120] PID: 5113   TASK: ffff880139530080  COMMAND: "sh"
     [120] PID: 5111   TASK: ffff88011bd86080  COMMAND: "sh"
     [120] PID: 5121   TASK: ffff880115a9e080  COMMAND: "sh"
     [120] PID: 5117   TASK: ffff8801152b0040  COMMAND: "sh"
     [120] PID: 5119   TASK: ffff880115a9eae0  COMMAND: "sh"

Is this kind of output for rt runqueues ok? Or do you have any suggestion?

Thanks
Zhang Yanfei
>From 550d428cbb6d9d22837e3ef138e1de59e7ccc1b3 Mon Sep 17 00:00:00 2001
From: zhangyanfei <zhangyanfei@xxxxxxxxxxxxxx>
Date: Sat, 25 Aug 2012 11:17:37 +0800
Subject: [PATCH] Fix rt not support group sched bug

Signed-off-by: zhangyanfei <zhangyanfei@xxxxxxxxxxxxxx>
---
 defs.h    |    2 ++
 symbols.c |    4 ++++
 task.c    |   47 ++++++++++++++++++++++++++++++++++++++++-------
 3 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/defs.h b/defs.h
index 4a8e2e3..4af670d 100755
--- a/defs.h
+++ b/defs.h
@@ -1785,6 +1785,7 @@ struct offset_table {                    /* stash of commonly-used offsets */
 	long log_level;
 	long log_flags_level;
 	long timekeeper_xtime_sec;
+	long sched_rt_entity_my_q;
 };
 
 struct size_table {         /* stash of commonly-used sizes */
@@ -1919,6 +1920,7 @@ struct size_table {         /* stash of commonly-used sizes */
 	long msg_queue;
 	long log;
 	long log_level;
+	long rt_rq;
 };
 
 struct array_table {
diff --git a/symbols.c b/symbols.c
index 2646ff8..bbadd5e 100755
--- a/symbols.c
+++ b/symbols.c
@@ -8812,6 +8812,8 @@ dump_offset_table(char *spec, ulong makestruct)
 		OFFSET(log_level));
 	fprintf(fp, "               log_flags_level: %ld\n",
 		OFFSET(log_flags_level));
+	fprintf(fp, "          sched_rt_entity_my_q: %ld\n",
+		OFFSET(sched_rt_entity_my_q));
 
 	fprintf(fp, "\n                    size_table:\n");
 	fprintf(fp, "                          page: %ld\n", SIZE(page));
@@ -9027,6 +9029,8 @@ dump_offset_table(char *spec, ulong makestruct)
 		SIZE(log));
 	fprintf(fp, "                     log_level: %ld\n",
 		SIZE(log_level));
+	fprintf(fp, "                         rt_rq: %ld\n",
+		SIZE(rt_rq));
 
         fprintf(fp, "\n                   array_table:\n");
 	/*
diff --git a/task.c b/task.c
index 6e4cfec..eeaad60 100755
--- a/task.c
+++ b/task.c
@@ -7552,6 +7552,7 @@ dump_CFS_runqueues(void)
 
 	if (!VALID_STRUCT(cfs_rq)) {
 		STRUCT_SIZE_INIT(cfs_rq, "cfs_rq");
+		STRUCT_SIZE_INIT(rt_rq, "rt_rq");
 		MEMBER_OFFSET_INIT(rq_rt, "rq", "rt");
 		MEMBER_OFFSET_INIT(rq_nr_running, "rq", "nr_running");
 		MEMBER_OFFSET_INIT(task_struct_se, "task_struct", "se");
@@ -7562,6 +7563,8 @@ dump_CFS_runqueues(void)
 			"cfs_rq");
 		MEMBER_OFFSET_INIT(sched_entity_my_q, "sched_entity", 
 			"my_q");
+		MEMBER_OFFSET_INIT(sched_rt_entity_my_q, "sched_rt_entity",
+			"my_q");
 		MEMBER_OFFSET_INIT(sched_entity_on_rq, "sched_entity", "on_rq");
 		MEMBER_OFFSET_INIT(cfs_rq_rb_leftmost, "cfs_rq", "rb_leftmost");
 		MEMBER_OFFSET_INIT(cfs_rq_nr_running, "cfs_rq", "nr_running");
@@ -7648,6 +7651,8 @@ dump_CFS_runqueues(void)
 		FREEBUF(cfs_rq_buf);
 }
 
+static int depth = 0;
+
 static void
 dump_RT_prio_array(ulong k_prio_array, char *u_prio_array)
 {
@@ -7657,8 +7662,11 @@ dump_RT_prio_array(ulong k_prio_array, char *u_prio_array)
         struct list_data list_data, *ld;
 	struct task_context *tc;
 	ulong *tlist;
+	ulong my_q, task_addr;
+	char *rt_rq_buf;
 
-	fprintf(fp, "  RT PRIO_ARRAY: %lx\n",  k_prio_array);
+	if (!depth)
+		fprintf(fp, "  RT PRIO_ARRAY: %lx\n",  k_prio_array);
 
         qheads = (i = ARRAY_LENGTH(rt_prio_array_queue)) ?
                 i : get_array_length("rt_prio_array.queue", NULL, SIZE(list_head));
@@ -7678,14 +7686,14 @@ dump_RT_prio_array(ulong k_prio_array, char *u_prio_array)
 		if ((list_head[0] == kvaddr) && (list_head[1] == kvaddr))
 			continue;
 
-		fprintf(fp, "     [%3d] ", i);
+		INDENT(5 + 9 * depth);
+		fprintf(fp, "[%3d] ", i);
 
 		BZERO(ld, sizeof(struct list_data));
 		ld->start = list_head[0];
 		if (VALID_MEMBER(task_struct_rt) &&
 		    VALID_MEMBER(sched_rt_entity_run_list))
-			ld->list_head_offset = OFFSET(task_struct_rt) + 
-				OFFSET(sched_rt_entity_run_list);
+			ld->list_head_offset = OFFSET(sched_rt_entity_run_list);
 		else
 			ld->list_head_offset = OFFSET(task_struct_run_list);
 		ld->end = kvaddr;
@@ -7695,10 +7703,35 @@ dump_RT_prio_array(ulong k_prio_array, char *u_prio_array)
 		tlist = (ulong *)GETBUF((cnt) * sizeof(ulong));
 		cnt = retrieve_list(tlist, cnt);
 		for (c = 0; c < cnt; c++) {
-			if (!(tc = task_to_context(tlist[c])))
+			task_addr = tlist[c];
+			if (VALID_MEMBER(sched_rt_entity_my_q)) {
+				readmem(tlist[c] + OFFSET(sched_rt_entity_my_q),
+					KVADDR, &my_q, sizeof(ulong), "my_q",
+					FAULT_ON_ERROR);
+				if (my_q) {
+					rt_rq_buf = GETBUF(SIZE(rt_rq));
+					readmem(my_q, KVADDR, rt_rq_buf,
+						SIZE(rt_rq), "rt_rq",
+						FAULT_ON_ERROR);
+					if (c)
+						INDENT(11 + 9 * depth);
+					fprintf(fp, "CHILD RT PRIO_ARRAY: %lx\n",
+						my_q + OFFSET(rt_rq_active));
+					tot++;
+					depth++;
+					dump_RT_prio_array(
+						my_q + OFFSET(rt_rq_active),
+						&rt_rq_buf[OFFSET(rt_rq_active)]);
+					depth--;
+					continue;
+				} else {
+					task_addr -= OFFSET(task_struct_rt);
+				}
+			}
+			if (!(tc = task_to_context(task_addr)))
 				continue;
 			if (c)
-				INDENT(11);
+				INDENT(11 + 9 * depth);
 			fprintf(fp, "PID: %-5ld  TASK: %lx  COMMAND: \"%s\"\n",
 				tc->pid, tc->task, tc->comm);
 			tot++;
@@ -7707,7 +7740,7 @@ dump_RT_prio_array(ulong k_prio_array, char *u_prio_array)
 	}
 
 	if (!tot) {
-		INDENT(5);
+		INDENT(5 + 9 * depth);
 		fprintf(fp, "[no tasks queued]\n");	
 	}
 }
-- 
1.7.1

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility