Re: [PATCH] Fix x86 initialization for {hard, soft}irq_ctx

OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx> · Fri, 17 Feb 2017 23:01:31 +0900

Dave Anderson <anderson@xxxxxxxxxx> writes:

>> >> crash> bt
>> >> PID: 0      TASK: c1da8b00  CPU: 0   COMMAND: "swapper/0"
>> >>  #0 [c1da1f60] __schedule at c19fe305
>> >>  #1 [c1da1fa0] schedule at c19febb3
>> >>  #2 [c1da1fac] schedule_preempt_disabled at c19ff0a2
>> >>  #3 [c1da1fb4] cpu_startup_entry at c10a9580
>> >> crash> bt 45
>> >> PID: 45     TASK: f57d3a00  CPU: 3   COMMAND: "kworker/3:1"
>> >> bt: cannot resolve stack trace:
>> >> bt: Task in user space -- no backtrace
>> >> 
>> >> In above case, looks like failed to detect panic cpu, and "bt 45" also
>> >> not working.
>> 
>> crash> bt 45
>> PID: 45     TASK: f57d3a00  CPU: 3   COMMAND: "kworker/3:1"
>> bt: cannot resolve stack trace:
>> bt: Task in user space -- no backtrace

Debugged this case. The root cause is nested stack of softirq =>
hardirq. Now doesn't handle it correctly, and the patch attacked.

BTW, with this patch, "bt -t" seems to be working at least. "bt" is
failed sometime by confusion of stack-frame detection, this one is
harder to fix.

[BTW, current x86_get_pc() uses inactive_task_frame_ret_addr to get
pc. However, inactive_task_frame is only valid if task is sleeping
state. (running task may overwrite inactive_task_frame already.)  I'm
not sure whether we should check is_task_active() or not. Even if
checking is_task_active(), we can't get pc correctly anyway.]

Thanks.
-- 
OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>

Subject: [PATCH] Fix nested stack like task => softirq => hardirq
To: hirofumi@xxxxxxxxxxxxxxxxxx
From: OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>
Date: Fri, 17 Feb 2017 22:49:45 +0900
Message-ID: <69ef59f73e858a6ff79115156.ps@xxxxxxxxxxxxxxxxxx>


Recent i386 kernel has separated stack for task, softirq, and hardirq.
Now crash try to determine between task and stack. But current code
doesn't handle if nested like task => softirq => hardirq.

This patch try to handle the nested case.

---

 task.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff -puN task.c~fix-x86-nest-stack task.c

--- crash-32/task.c~fix-x86-nest-stack	2017-02-17 22:40:40.430200158 +0900
+++ crash-32-hirofumi/task.c	2017-02-17 22:47:22.304439206 +0900
@@ -555,7 +555,7 @@ irqstacks_init(void)
 	int i;
 	char *thread_info_buf;
 	struct syment *hard_sp, *soft_sp;
-	ulong ptr;
+	ulong ptr, hardirq_next_sp = 0;
 
 	if (!(tt->hardirq_ctx = (ulong *)calloc(NR_CPUS, sizeof(ulong))))
 		error(FATAL, "cannot malloc hardirq_ctx space.");
@@ -610,8 +610,10 @@ irqstacks_init(void)
 		if (MEMBER_EXISTS("irq_ctx", "tinfo"))
 			tt->hardirq_tasks[i] = 
 				ULONG(thread_info_buf+OFFSET(thread_info_task));
-		else
-			tt->hardirq_tasks[i] = stkptr_to_task(ULONG(thread_info_buf));
+		else {
+			hardirq_next_sp = ULONG(thread_info_buf);
+			tt->hardirq_tasks[i] = stkptr_to_task(hardirq_next_sp);
+		}
 	}
 
 	if ((soft_sp = per_cpu_symbol_search("per_cpu__softirq_ctx")) ||
@@ -656,8 +658,15 @@ irqstacks_init(void)
 		if (MEMBER_EXISTS("irq_ctx", "tinfo")) 
 			tt->softirq_tasks[i] =
 				ULONG(thread_info_buf+OFFSET(thread_info_task));
-		else
+		else {
 			tt->softirq_tasks[i] = stkptr_to_task(ULONG(thread_info_buf));
+			/* Checking if softirq => hardirq nested stack */
+			if (tt->softirq_tasks[i] != NO_TASK && hardirq_next_sp) {
+				if (tt->softirq_ctx[i] <= hardirq_next_sp &&
+				    hardirq_next_sp < tt->softirq_ctx[i] + STACKSIZE())
+					tt->hardirq_tasks[i] = tt->softirq_tasks[i];
+			}
+		}
 	}
 
         tt->flags |= IRQSTACKS;
_
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility