Re: arm64: Support overflow stack panic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Lianbo

Please help to review the new patch v7 with only one change for removing the redundant code.

Patch is patch, what's  the better way attach it into an email? Copy & paste would not applicable for a large patch file.

I have no vmcore file, but there is a kernel module which would help to trigger an overflow stack panic for testing, please download the module form link [1] and compile it as a module to load it into your test box, please read the README.txt and the source code for more details.


Best regards
Hong

From: Hong YANG3 杨红 <hong.yang3@xxxxxxx>
Sent: Monday, November 29, 2021 11:40
To: lijiang <lijiang@xxxxxxxxxx>; Discussion list for crash utility usage, maintenance and development <crash-utility@xxxxxxxxxx>
Subject: Re: arm64: Support overflow stack panic
 
Hi Lianbo

I'm using outlook to send mail to this list, I'll try to find a better way to send out patch and mails more friendly for all reader, .

I'll send out a demo kernel module which can trigger an overflow panic for testing, and also the patch will be updated as your comment in previous mail.

Thanks for your quickly reply.

Best regards
Hong

From: lijiang <lijiang@xxxxxxxxxx>
Sent: Monday, November 29, 2021 10:58
To: Hong YANG3 杨红 <hong.yang3@xxxxxxx>; Discussion list for crash utility usage, maintenance and development <crash-utility@xxxxxxxxxx>
Subject: Re: arm64: Support overflow stack panic
 
注意:此封邮件来自于公司外部,请注意信息安全!
Attention: This email comes from outside of the company, please pay attention to the information security!
Hi, Hong

Thank you for the patch.  I added the comments below, other changes look good to me.

@@ -1978,7 +2028,10 @@ arm64_in_exception_text(ulong ptr)
                if ((ptr >= ms->__exception_text_start) &&
                    (ptr < ms->__exception_text_end))
                        return TRUE;
-       } else if ((name = closest_symbol(ptr))) {  /* Linux 5.5 and later */
+       }
+
+       name = closest_symbol(ptr);
+       if (name != NULL) { /* Linux 5.5 and later */

The above changes are irrelevant to your patch itself. But anyway this looks more readable to me.

                for (func = &arm64_exception_functions[0]; *func; func++) {
                        if (STREQ(name, *func))
                                return TRUE;
@@ -2255,12 +2308,14 @@ arm64_unwind_frame(struct bt_info *bt, struct arm64_stackframe *frame)
        if (!(machdep->flags & IRQ_STACKS))
                return TRUE;

-       if (!(machdep->flags & IRQ_STACKS))
+       if (!(machdep->flags & OVERFLOW_STACKS))
                return TRUE;

Originally, it had two same(repeated) statements, one of which must be redundant.  This time, can it be changed to a statement as below?

if (!(machdep->flags & (IRQ_STACKS | OVERFLOW_STACKS)))
        return TRUE;

BTW:  this patch was sent as an attachment, which is inconvenient for other reviewers to add comments.

In addition, I  have a request: can you share the vmcore with me if it doesn't have confidential data? I'm collecting the specific vmcore
for the test, at least I haven't reproduced it.

Thanks.
Lianbo
From 5c04bffd220240d5e2fa09d522062f8798eb42a9 Mon Sep 17 00:00:00 2001
From: Hong YANG <hong.yang3@xxxxxxx>
Date: Mon, 15 Nov 2021 15:41:01 +0800
Subject: [PATCH] arm64: Support overflow stack panic

Overflow stack exception handling supported since kernel 4.14
in commit 872d8327ce8, this patch trying to load the overflow_stack
information on startup and dump back trace from the overflow stack.

Before:

      KERNEL: vmlinux
    DUMPFILE: core.file
        CPUS: 8
        DATE: Mon Nov 29 15:49:26 CST 2021
      UPTIME: 00:02:51
LOAD AVERAGE: 1.02, 0.88, 0.37
       TASKS: 1857
    NODENAME: localhost
     RELEASE: 4.14.156+
     VERSION: #1 SMP PREEMPT Thu Nov 25 13:07:21 UTC 2021
     MACHINE: aarch64  (unknown Mhz)
      MEMORY: 8 GB
       PANIC: "Kernel panic - not syncing: kernel stack overflow"
         PID: 3607
     COMMAND: "sh"
        TASK: ffffffcbf9a4da00  [THREAD_INFO: ffffffcbf9a4da00]
         CPU: 2
       STATE: TASK_RUNNING (PANIC)

crash-7.3.0.orig> bt
PID: 3607   TASK: ffffffcbf9a4da00  CPU: 2   COMMAND: "sh"
Segmentation fault (core dumped)

After:

crash> bt
PID: 3607   TASK: ffffffcbf9a4da00  CPU: 2   COMMAND: "sh"
 #0 [ffffffccbfd85f50] __delay at ffffff8008ceded8
...
 #5 [ffffffccbfd85fd0] emergency_restart at ffffff80080d49fc
 #6 [ffffffccbfd86140] panic at ffffff80080af4c0
 #7 [ffffffccbfd86150] nmi_panic at ffffff80080af150
 #8 [ffffffccbfd86190] handle_bad_stack at ffffff800808b0b8
 #9 [ffffffccbfd862d0] __bad_stack at ffffff800808285c
     PC: ffffff8008082e80  [el1_sync]
     LR: ffffff8000d6c214  [stack_overflow_demo+84]
     SP: ffffff1a79930070  PSTATE: 204003c5
    X29: ffffff8011b03d00  X28: ffffffcbf9a4da00  X27: ffffff8008e02000
    X26: 0000000000000040  X25: 0000000000000124  X24: ffffffcbf9a4da00
    X23: 0000007daec2e288  X22: ffffffcbfe03b800  X21: 0000007daec2e288
    X20: 0000000000000002  X19: 0000000000000002  X18: 0000000000000002
    X17: 00000000000003e7  X16: 0000000000000000  X15: 0000000000000000
    X14: ffffffcc17facb00  X13: ffffffccb4c25c00  X12: 0000000000000000
    X11: ffffffcc17fad660  X10: 0000000000000af0   X9: 0000000000000000
     X8: ffffff1a799334f0   X7: 0000000000000000   X6: 000000000000003f
     X5: 0000000000000040   X4: 0000000000000010   X3: 00000065981d07f0
     X2: 00000065981d07f0   X1: 0000000000000000   X0: ffffff1a799334f0
--- <Overflow stack> ---
 #10 [ffffff8011b03d00] el1_error_invalid at ffffff8008082e7c
 #11 [ffffff8011b03d60] write_enable at ffffff8000d6c134 [pso]
 #12 [ffffff8011b03da0] full_proxy_write at ffffff800839b2fc
 #13 [ffffff8011b03e30] __vfs_write at ffffff800823f4d8
 #14 [ffffff8011b03e70] vfs_write at ffffff800823f874
 #15 [ffffff8011b03eb0] sys_write at ffffff800823fa68
 #16 [ffffff8011b03ff0] el0_svc_naked at ffffff800808387c
     PC: 0000007daf070244   LR: 000000648253a090   SP: 0000007ff2a00fa0
    X29: 0000007ff2a01050  X28: 000000648257e000  X27: 0000007ff2a00fb0
    X26: 000000648257fdb9  X25: 0000000000000000  X24: 0000007ff2a00fc8
    X23: 0000007ff2a00fd0  X22: 000000648253d270  X21: 000000648257e080
    X20: 0000007daec2e288  X19: 0000000000000002  X18: 0000000000000008
    X17: 0000007daf07023c  X16: 000000648257de48  X15: aaaaaaaaaaaaaaab
    X14: 0000000000000800  X13: 0000007ff2a01040  X12: 0000007daec0d848
    X11: 0000000000000003  X10: 0000007daec2e289   X9: 0000007daec2e288
     X8: 0000000000000040   X7: 0000000000000000   X6: 0000000000000031
     X5: 0000007daec2c32a   X4: 0000007daec34768   X3: 0000007daec2e1e8
     X2: 0000000000000002   X1: 0000007daec2e288   X0: 0000000000000001
    ORIG_X0: 0000000000000001  SYSCALLNO: 40  PSTATE: 80001000

Signed-off-by: Hong YANG <hong.yang3@xxxxxxx>
---
 arm64.c | 169 ++++++++++++++++++++++++++++++++++++++++++++++++++------
 defs.h  |   6 ++
 2 files changed, 159 insertions(+), 16 deletions(-)

diff --git a/arm64.c b/arm64.c
index 94681d1..23c3d75 100644
--- a/arm64.c
+++ b/arm64.c
@@ -45,6 +45,7 @@ static int arm64_vtop_3level_4k(ulong, ulong, physaddr_t *, int);
 static int arm64_vtop_4level_4k(ulong, ulong, physaddr_t *, int);
 static ulong arm64_get_task_pgd(ulong);
 static void arm64_irq_stack_init(void);
+static void arm64_overflow_stack_init(void);
 static void arm64_stackframe_init(void);
 static int arm64_eframe_search(struct bt_info *);
 static int arm64_is_kernel_exception_frame(struct bt_info *, ulong);
@@ -63,6 +64,7 @@ static int arm64_get_dumpfile_stackframe(struct bt_info *, struct arm64_stackfra
 static int arm64_in_kdump_text(struct bt_info *, struct arm64_stackframe *);
 static int arm64_in_kdump_text_on_irq_stack(struct bt_info *);
 static int arm64_switch_stack(struct bt_info *, struct arm64_stackframe *, FILE *);
+static int arm64_switch_stack_from_overflow(struct bt_info *, struct arm64_stackframe *, FILE *);
 static int arm64_get_stackframe(struct bt_info *, struct arm64_stackframe *);
 static void arm64_get_stack_frame(struct bt_info *, ulong *, ulong *);
 static void arm64_gen_hidden_frame(struct bt_info *bt, ulong, struct arm64_stackframe *);
@@ -78,8 +80,11 @@ static int arm64_get_smp_cpus(void);
 static void arm64_clear_machdep_cache(void);
 static int arm64_on_process_stack(struct bt_info *, ulong);
 static int arm64_in_alternate_stack(int, ulong);
+static int arm64_in_alternate_stackv(int cpu, ulong stkptr, ulong *stacks, ulong stack_size);
 static int arm64_on_irq_stack(int, ulong);
+static int arm64_on_overflow_stack(int, ulong);
 static void arm64_set_irq_stack(struct bt_info *);
+static void arm64_set_overflow_stack(struct bt_info *);
 static void arm64_set_process_stack(struct bt_info *);
 static int arm64_get_kvaddr_ranges(struct vaddr_range *);
 static void arm64_get_crash_notes(void);
@@ -463,6 +468,7 @@ arm64_init(int when)
 			machdep->hz = 100;
 
 		arm64_irq_stack_init();
+		arm64_overflow_stack_init();
 		arm64_stackframe_init();
 		break;
 
@@ -1715,6 +1721,49 @@ arm64_irq_stack_init(void)
 	} 
 }
 
+/*
+ *  Gather Overflow stack values.
+ *
+ *  Overflow stack supported since 4.14, in commit 872d8327c
+ */
+static void
+arm64_overflow_stack_init(void)
+{
+	int i;
+	struct syment *sp;
+	struct gnu_request request, *req;
+	struct machine_specific *ms = machdep->machspec;
+	req = &request;
+
+	if (symbol_exists("overflow_stack") &&
+	    (sp = per_cpu_symbol_search("overflow_stack")) &&
+	    get_symbol_type("overflow_stack", NULL, req)) {
+		if (CRASHDEBUG(1)) {
+			fprintf(fp, "overflow_stack: \n");
+			fprintf(fp, "  type: %x, %s\n",
+				(int)req->typecode,
+				(req->typecode == TYPE_CODE_ARRAY) ?
+						"TYPE_CODE_ARRAY" : "other");
+			fprintf(fp, "  target_typecode: %x, %s\n",
+				(int)req->target_typecode,
+				req->target_typecode == TYPE_CODE_INT ?
+						"TYPE_CODE_INT" : "other");
+			fprintf(fp, "  target_length: %ld\n",
+						req->target_length);
+			fprintf(fp, "  length: %ld\n", req->length);
+		}
+
+		if (!(ms->overflow_stacks = (ulong *)malloc((size_t)(kt->cpus * sizeof(ulong)))))
+			error(FATAL, "cannot malloc overflow_stack addresses\n");
+
+		ms->overflow_stack_size = ARM64_OVERFLOW_STACK_SIZE;
+		machdep->flags |= OVERFLOW_STACKS;
+
+		for (i = 0; i < kt->cpus; i++)
+			ms->overflow_stacks[i] = kt->__per_cpu_offset[i] + sp->value;
+	}
+}
+
 /*
  *  Gather and verify all of the backtrace requirements.
  */
@@ -1960,6 +2009,7 @@ static char *arm64_exception_functions[] = {
         "do_mem_abort",
         "do_el0_irq_bp_hardening",
         "do_sp_pc_abort",
+        "handle_bad_stack",
         NULL
 };
 
@@ -1978,7 +2028,10 @@ arm64_in_exception_text(ulong ptr)
 		if ((ptr >= ms->__exception_text_start) &&
 		    (ptr < ms->__exception_text_end))
 			return TRUE;
-	} else if ((name = closest_symbol(ptr))) {  /* Linux 5.5 and later */
+	}
+
+	name = closest_symbol(ptr);
+	if (name != NULL) { /* Linux 5.5 and later */
 		for (func = &arm64_exception_functions[0]; *func; func++) {
 			if (STREQ(name, *func))
 				return TRUE;
@@ -2252,15 +2305,14 @@ arm64_unwind_frame(struct bt_info *bt, struct arm64_stackframe *frame)
 	if ((frame->fp == 0) && (frame->pc == 0))
 		return FALSE;
 
-	if (!(machdep->flags & IRQ_STACKS))
-		return TRUE;
-
-	if (!(machdep->flags & IRQ_STACKS))
+	if (!(machdep->flags & (IRQ_STACKS | OVERFLOW_STACKS)))
 		return TRUE;
 
 	if (machdep->flags & UNW_4_14) {
-		if ((bt->flags & BT_IRQSTACK) &&
-		    !arm64_on_irq_stack(bt->tc->processor, frame->fp)) {
+		if (((bt->flags & BT_IRQSTACK) &&
+		     !arm64_on_irq_stack(bt->tc->processor, frame->fp)) ||
+		    ((bt->flags & BT_OVERFLOW_STACK) &&
+		     !arm64_on_overflow_stack(bt->tc->processor, frame->fp))) {
 			if (arm64_on_process_stack(bt, frame->fp)) {
 				arm64_set_process_stack(bt);
 
@@ -2677,6 +2729,9 @@ arm64_back_trace_cmd(struct bt_info *bt)
 		if (arm64_on_irq_stack(bt->tc->processor, bt->frameptr)) {
 			arm64_set_irq_stack(bt);
 			bt->flags |= BT_IRQSTACK;
+		} else if (arm64_on_overflow_stack(bt->tc->processor, bt->frameptr)) {
+			arm64_set_overflow_stack(bt);
+			bt->flags |= BT_OVERFLOW_STACK;
 		}
 		stackframe.sp = bt->stkptr;
 		stackframe.pc = bt->instptr;
@@ -2731,7 +2786,9 @@ arm64_back_trace_cmd(struct bt_info *bt)
 			break;
 
 		if (arm64_in_exception_text(bt->instptr) && INSTACK(stackframe.fp, bt)) {
-			if (!(bt->flags & BT_IRQSTACK) ||
+			if (bt->flags & BT_OVERFLOW_STACK) {
+				exception_frame = stackframe.fp - KERN_EFRAME_OFFSET;
+			} else if (!(bt->flags & BT_IRQSTACK) ||
 			    ((stackframe.sp + SIZE(pt_regs)) < bt->stacktop)) {
 				if (arm64_is_kernel_exception_frame(bt, stackframe.fp - KERN_EFRAME_OFFSET))
 					exception_frame = stackframe.fp - KERN_EFRAME_OFFSET;
@@ -2745,6 +2802,12 @@ arm64_back_trace_cmd(struct bt_info *bt)
 				break;
 		}
 
+		if ((bt->flags & BT_OVERFLOW_STACK) &&
+		    !arm64_on_overflow_stack(bt->tc->processor, stackframe.fp)) {
+			bt->flags &= ~BT_OVERFLOW_STACK;
+			if (arm64_switch_stack_from_overflow(bt, &stackframe, ofp) == USER_MODE)
+				break;
+		}
 
 		level++;
 	}
@@ -3131,6 +3194,43 @@ arm64_switch_stack(struct bt_info *bt, struct arm64_stackframe *frame, FILE *ofp
 	return KERNEL_MODE;
 }
 
+static int
+arm64_switch_stack_from_overflow(struct bt_info *bt, struct arm64_stackframe *frame, FILE *ofp)
+{
+	int i;
+	ulong stacktop, words, addr;
+	ulong *stackbuf;
+	char buf[BUFSIZE];
+	struct machine_specific *ms = machdep->machspec;
+
+	if (bt->flags & BT_FULL) {
+		stacktop = ms->overflow_stacks[bt->tc->processor] + ms->overflow_stack_size;
+		words = (stacktop - bt->bptr) / sizeof(ulong);
+		stackbuf = (ulong *)GETBUF(words * sizeof(ulong));
+		readmem(bt->bptr, KVADDR, stackbuf, words * sizeof(long),
+			"top of overflow stack", FAULT_ON_ERROR);
+
+		addr = bt->bptr;
+		for (i = 0; i < words; i++) {
+			if (!(i & 1))
+				fprintf(ofp, "%s    %lx: ", i ? "\n" : "", addr);
+			fprintf(ofp, "%s ", format_stack_entry(bt, buf, stackbuf[i], 0));
+			addr += sizeof(ulong);
+		}
+		fprintf(ofp, "\n");
+		FREEBUF(stackbuf);
+	}
+	fprintf(ofp, "--- <Overflow stack> ---\n");
+
+	if (frame->fp == 0)
+		return USER_MODE;
+
+	if (!(machdep->flags & UNW_4_14))
+		arm64_print_exception_frame(bt, frame->sp, KERNEL_MODE, ofp);
+
+	return KERNEL_MODE;
+}
+
 static int
 arm64_get_dumpfile_stackframe(struct bt_info *bt, struct arm64_stackframe *frame)
 {
@@ -3682,6 +3782,16 @@ arm64_display_machine_stats(void)
 				machdep->machspec->irq_stacks[i]);
 		}
 	}
+	if (machdep->machspec->overflow_stack_size) {
+		fprintf(fp, "OVERFLOW STACK SIZE: %ld\n",
+			machdep->machspec->overflow_stack_size);
+		fprintf(fp, "    OVERFLOW STACKS:\n");
+		for (i = 0; i < kt->cpus; i++) {
+			pad = (i < 10) ? 3 : (i < 100) ? 2 : (i < 1000) ? 1 : 0;
+			fprintf(fp, "%s           CPU %d: %lx\n", space(pad), i,
+				machdep->machspec->overflow_stacks[i]);
+		}
+	}
 }
 
 static int
@@ -3875,24 +3985,41 @@ arm64_on_process_stack(struct bt_info *bt, ulong stkptr)
 }
 
 static int
-arm64_on_irq_stack(int cpu, ulong stkptr)
+arm64_in_alternate_stackv(int cpu, ulong stkptr, ulong *stacks, ulong stack_size)
 {
-	return arm64_in_alternate_stack(cpu, stkptr);
+	if ((cpu >= kt->cpus) || (stacks == NULL) || !stack_size)
+		return FALSE;
+
+	if ((stkptr >= stacks[cpu]) &&
+	    (stkptr < (stacks[cpu] + stack_size)))
+		return TRUE;
+
+	return FALSE;
 }
 
 static int
 arm64_in_alternate_stack(int cpu, ulong stkptr)
+{
+	return (arm64_on_irq_stack(cpu, stkptr) ||
+		arm64_on_overflow_stack(cpu, stkptr));
+}
+
+static int
+arm64_on_irq_stack(int cpu, ulong stkptr)
 {
 	struct machine_specific *ms = machdep->machspec;
 
-	if (!ms->irq_stack_size || (cpu >= kt->cpus))
-		return FALSE;
+	return arm64_in_alternate_stackv(cpu, stkptr,
+			ms->irq_stacks, ms->irq_stack_size);
+}
 
-	if ((stkptr >= ms->irq_stacks[cpu]) &&
-	    (stkptr < (ms->irq_stacks[cpu] + ms->irq_stack_size)))
-		return TRUE;
+static int
+arm64_on_overflow_stack(int cpu, ulong stkptr)
+{
+	struct machine_specific *ms = machdep->machspec;
 
-	return FALSE;
+	return arm64_in_alternate_stackv(cpu, stkptr,
+			ms->overflow_stacks, ms->overflow_stack_size);
 }
 
 static void
@@ -3905,6 +4032,16 @@ arm64_set_irq_stack(struct bt_info *bt)
 	alter_stackbuf(bt);
 }
 
+static void
+arm64_set_overflow_stack(struct bt_info *bt)
+{
+	struct machine_specific *ms = machdep->machspec;
+
+	bt->stackbase = ms->overflow_stacks[bt->tc->processor];
+	bt->stacktop = bt->stackbase + ms->overflow_stack_size;
+	alter_stackbuf(bt);
+}
+
 static void
 arm64_set_process_stack(struct bt_info *bt)
 {
diff --git a/defs.h b/defs.h
index a2f3085..7e2a16e 100644
--- a/defs.h
+++ b/defs.h
@@ -3218,6 +3218,7 @@ typedef signed int s32;
 #define UNW_4_14      (0x200)
 #define FLIPPED_VM    (0x400)
 #define HAS_PHYSVIRT_OFFSET (0x800)
+#define OVERFLOW_STACKS     (0x1000)
 
 /*
  * Get kimage_voffset from /dev/crash
@@ -3260,6 +3261,7 @@ typedef signed int s32;
 
 #define ARM64_STACK_SIZE   (16384)
 #define ARM64_IRQ_STACK_SIZE   ARM64_STACK_SIZE
+#define ARM64_OVERFLOW_STACK_SIZE   (4096)
 
 #define _SECTION_SIZE_BITS           30
 #define _SECTION_SIZE_BITS_5_12      27
@@ -3332,6 +3334,9 @@ struct machine_specific {
 	char  *irq_stackbuf;
 	ulong __irqentry_text_start;
 	ulong __irqentry_text_end;
+	ulong overflow_stack_size;
+	ulong *overflow_stacks;
+	char  *overflow_stackbuf;
 	/* for exception vector code */
 	ulong exp_entry1_start;
 	ulong exp_entry1_end;
@@ -5770,6 +5775,7 @@ ulong cpu_map_addr(const char *type);
 #define BT_CPUMASK        (0x1000000000000ULL)
 #define BT_SHOW_ALL_REGS  (0x2000000000000ULL)
 #define BT_REGS_NOT_FOUND (0x4000000000000ULL)
+#define BT_OVERFLOW_STACK (0x8000000000000ULL)
 #define BT_SYMBOL_OFFSET   (BT_SYMBOLIC_ARGS)
 
 #define BT_REF_HEXVAL         (0x1)
-- 
2.25.1

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/crash-utility

[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux