On 10/01/2018 09:50 AM, Eugeniy Paltsev wrote: > Hi Vineet, > > any chance you could look at this? > Otherwise we will both forget what it is all about :) I was busy with non Linux stuff for quite some time - hopefully that should be wrapped up and I can get back to tending you ;-) -Vineet > > On Tue, 2018-08-07 at 16:59 +0300, Eugeniy Paltsev wrote: >> We process signals in the end of syscall/exception handler. >> It the signal is fatal we print register's content using >> show_regs function. show_regs() also prints information about >> last exception happened. >> >> In case of multicore system we can catch the situation when we >> will print wrong information about exception. See the example: >> ______________________________ >> CPU-0: started to handle page fault >> CPU-1: sent signal to process, which is executed on CPU-0 >> CPU-0: ended page fault handle. Started to process signal before >> returnig to userspace. Process signal, which is send from >> CPU-0. As th signal is fatal we call show_regs(). >> show_regs() will show information about last exception >> which is *page fault* (instead of "trap" which is used for >> signals and happened on CPU-0) >> >> So we will get message like this: >> # ./waitpid02 >> potentially unexpected fatal signal 8. >> Path: /home/waitpid02 >> CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2 >> task: 9f11c200 task.stack: 9f3ae000 >> >> [ECR ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec >> [EFA ]: 0x00000000 >> [BLINK ]: 0x123ea >> [ERET ]: 0x123ec >> @off 0x123ec in [/home/waitpid02] >> VMA: 0x00010000 to 0x00016000 >> [STAT32]: 0x80080882 : IE U >> BTA: 0x000123ea SP: 0x5ffd3db0 FP: 0x00000000 >> LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006 >> [-----other-info-----] >> >> This message is confusing because it show information about page fault >> ( [ECR ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant >> to signal. >> >> This situation was reproduced with waitpid02 LTP test. >> _____________________________ >> >> So remove printing information about exceptions from show_regs() >> to avoid confusing messages. Print information about exceptions >> only in required places instead of show_regs() >> >> Now we don't print information about exceptions if signal is simply >> send by another userspace app. So in case of waitpid02 we will print >> next message: >> _____________________________ >> # ./waitpid02 >> potentially unexpected fatal signal 8. >> Path: /root/waitpid02 >> CPU: 2 PID: 105 Comm: waitpid02 Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17 >> [ECR ]: 0x00050100 >> [EFA ]: 0x00000000 >> [BLINK ]: 0x20001486 >> [-----other-info-----] >> _____________________________ >> >> This patch fix >> STAR 9001146055: waitpid02: Invalid Write @ 0x00000000 by insn @ 0x000123ec >> >> NOTE: >> To be more clear I give examples of different faults (signal-based, >> userspace/kernelspace exception-based) with different values of >> "/proc/sys/kernel/print-fatal-signals" option. >> >> 0) NULL pointer access from user space, print-fatal-signals == 1: >> ------------>8--------------- >> # ./arc_hell >> Exception: arc_hell[103]: at 0x2003a35c [off 0x2e35c in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] >> ECR: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x2003a35c >> potentially unexpected fatal signal 11. >> Path: /root/arc_hell >> CPU: 1 PID: 103 Comm: arc_hell Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17 >> [ECR ]: 0x00050100 >> [EFA ]: 0x00000000 >> [BLINK ]: 0x20039ef8 >> [ERET ]: 0x2003a35c >> [STAT32]: 0x80080882 : IE U >> BTA: 0x2003a358 SP: 0x5fa27dc4 FP: 0x5fa27de8 >> LPS: 0x2003a628 LPE: 0x2003a62c LPC: 0x00000000 >> r00: 0x00000000 r01: 0x200740b0 r02: 0x00000001 >> r03: 0x00000007 r04: 0x80808080 r05: 0x2f2f2f2f >> r06: 0x7c7a2f43 r07: 0x00000000 r08: 0x1a131100 >> r09: 0x2008b1e0 r10: 0x20003a5c r11: 0x20004038 >> r12: 0x20039ef8 r13: 0x200740b0 r14: 0x00000000 >> r15: 0x200740b0 r16: 0x00000000 r17: 0x0007d468 >> r18: 0x0009313a r19: 0x00000000 r20: 0x0009c22c >> r21: 0x0009c23c r22: 0x0009ab64 r23: 0x00000000 >> r24: 0x0009dfc5 r25: 0x20004b70 >> >> Segmentation fault >> ------------>8--------------- >> >> 1) NULL pointer access from user space, print-fatal-signals == 0: >> ------------>8--------------- >> # ./arc_hell >> Exception: arc_hell[107]: at 0x2003a35c [off 0x2e35c in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] >> ECR: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x2003a35c >> Segmentation fault >> ------------>8--------------- >> >> 2) Process killed by signal (waitpid02 test), print-fatal-signals == 1: >> ------------>8--------------- >> # ./waitpid02 >> potentially unexpected fatal signal 8. >> Path: /root/waitpid02 >> CPU: 2 PID: 105 Comm: waitpid02 Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17 >> [ECR ]: 0x00050100 >> [EFA ]: 0x00000000 >> [BLINK ]: 0x20001486 >> [ERET ]: 0x2000146c >> [STAT32]: 0x80080082 : IE U >> BTA: 0x20000fc4 SP: 0x5fa21d64 FP: 0x00000000 >> LPS: 0x200524a0 LPE: 0x200524b6 LPC: 0x00000006 >> r00: 0x2000c0dc r01: 0x00000018 r02: 0x0001159a >> r03: 0x00000001 r04: 0x00000000 r05: 0x00000045 >> r06: 0x0000004e r07: 0x01010101 r08: 0x000000dc >> r09: 0x200a31e0 r10: 0x20003a5c r11: 0x20004038 >> r12: 0x20001486 r13: 0x20004174 r14: 0x07ca2bc0 >> r15: 0x20004078 r16: 0x00000000 r17: 0x20004038 >> r18: 0x00000001 r19: 0x00000000 r20: 0x0001159a >> r21: 0x00000001 r22: 0x00000000 r23: 0x00000004 >> r24: 0x2000d1fc r25: 0x20004cd0 >> ------------>8--------------- >> >> 3) Process killed by signal (waitpid02 test), print-fatal-signals == 0: >> ------------>8--------------- >> # ./waitpid02 >> ------------>8--------------- >> >> 4) NULL pointer access from kernel space, regardless print-fatal-signals >> value: >> ------------>8--------------- >> Oops >> Exception: at stmmac_dvr_probe+0x40/0x7e0: >> ECR: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x90575254 >> Path: (null) >> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #11 >> [EFA ]: 0x00000000 >> [BLINK ]: stmmac_dvr_probe+0x2a/0x7e0 >> [ERET ]: stmmac_dvr_probe+0x40/0x7e0 >> [STAT32]: 0x80080402 : IE K >> BTA: 0x9057523e SP: 0xbf02be40 FP: 0x00000000 >> LPS: 0x9073dbcc LPE: 0x9073dbe0 LPC: 0x00000000 >> r00: 0xbf1f0000 r01: 0x00000000 r02: 0x00000008 >> r03: 0x00000000 r04: 0x00000064 r05: 0x00000000 >> r06: 0x00000003 r07: 0x00000004 r08: 0x01010101 >> r09: 0x00000031 r10: 0xbf1f0006 r11: 0x00000000 >> r12: 0x9057523e >> >> Stack Trace: >> stmmac_dvr_probe+0x40/0x7e0 >> dwmac_generic_probe+0x50/0xbc >> platform_drv_probe+0x26/0x5c >> really_probe+0x3b0/0x464 >> __driver_attach+0xd4/0xd8 >> bus_for_each_dev+0x3a/0x70 >> bus_add_driver+0x12a/0x18c >> driver_register+0x56/0xe8 >> do_one_initcall+0x2e/0x118 >> kernel_init_freeable+0x102/0x194 >> ------------>8--------------- >> >> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev at synopsys.com> >> --- >> >> NOTE: >> This patch has prerequisites: >> https://www.mail-archive.com/linux-snps-arc at lists.infradead.org/msg03938.html >> https://www.mail-archive.com/linux-snps-arc at lists.infradead.org/msg03937.html >> >> Changes v1->v2: >> * Change format of message about exception. >> >> arch/arc/include/asm/bug.h | 1 + >> arch/arc/kernel/traps.c | 1 + >> arch/arc/kernel/troubleshoot.c | 51 +++++++++++++++++++++++++++++++++--------- >> arch/arc/mm/fault.c | 2 ++ >> 4 files changed, 44 insertions(+), 11 deletions(-) >> >> diff --git a/arch/arc/include/asm/bug.h b/arch/arc/include/asm/bug.h >> index 21ec82466d62..b68f7f82f2d8 100644 >> --- a/arch/arc/include/asm/bug.h >> +++ b/arch/arc/include/asm/bug.h >> @@ -16,6 +16,7 @@ >> struct task_struct; >> >> void show_regs(struct pt_regs *regs); >> +void show_exception_mesg(struct pt_regs *regs); >> void show_stacktrace(struct task_struct *tsk, struct pt_regs *regs); >> void show_kernel_fault_diag(const char *str, struct pt_regs *regs, >> unsigned long address); >> diff --git a/arch/arc/kernel/traps.c b/arch/arc/kernel/traps.c >> index b123558bf0bb..5d83b2fe5f12 100644 >> --- a/arch/arc/kernel/traps.c >> +++ b/arch/arc/kernel/traps.c >> @@ -50,6 +50,7 @@ unhandled_exception(const char *str, struct pt_regs *regs, siginfo_t *info) >> tsk->thread.fault_address = (__force unsigned int)info->si_addr; >> >> force_sig_info(info->si_signo, info, tsk); >> + show_exception_mesg(regs); >> >> } else { >> /* If not due to copy_(to|from)_user, we are doomed */ >> diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c >> index e8d9fb452346..fdfba1942a06 100644 >> --- a/arch/arc/kernel/troubleshoot.c >> +++ b/arch/arc/kernel/troubleshoot.c >> @@ -101,13 +101,13 @@ static void show_faulting_vma(unsigned long address, char *buf) >> if (IS_ERR(nm)) >> nm = "?"; >> } >> - pr_info(" @off 0x%lx in [%s]\n" >> - " VMA: 0x%08lx to 0x%08lx\n", >> + >> + pr_cont("[off 0x%lx in %s, VMA: %08lx:%08lx] ", >> vma->vm_start < TASK_UNMAPPED_BASE ? >> address : address - vma->vm_start, >> nm, vma->vm_start, vma->vm_end); >> } else >> - pr_info(" @No matching VMA found\n"); >> + pr_cont("[No matching VMA found] "); >> >> up_read(&active_mm->mmap_sem); >> } >> @@ -117,7 +117,7 @@ static void show_ecr_verbose(struct pt_regs *regs) >> unsigned int vec, cause_code; >> unsigned long address; >> >> - pr_info("\n[ECR ]: 0x%08lx => ", regs->event); >> + pr_cont("\n ECR: 0x%08lx => ", regs->event); >> >> /* For Data fault, this is data address not instruction addr */ >> address = current->thread.fault_address; >> @@ -165,10 +165,42 @@ static void show_ecr_verbose(struct pt_regs *regs) >> } >> } >> >> +static inline void show_exception_mesg_u(struct pt_regs *regs) >> +{ >> + struct task_struct *tsk = current; >> + char *buf; >> + >> + pr_info("Exception: %s[%d]: at %pS ", >> + tsk->comm, task_pid_nr(tsk), (void *)regs->ret); >> + >> + buf = (char *)__get_free_page(GFP_NOWAIT); >> + if (buf) { >> + show_faulting_vma(regs->ret, buf); >> + free_page((unsigned long)buf); >> + } >> + >> + show_ecr_verbose(regs); >> +} >> + >> +static inline void show_exception_mesg_k(struct pt_regs *regs) >> +{ >> + pr_info("Exception: at %pS:", (void *)regs->ret); >> + >> + show_ecr_verbose(regs); >> +} >> + >> /************************************************************************ >> * API called by rest of kernel >> ***********************************************************************/ >> >> +void show_exception_mesg(struct pt_regs *regs) >> +{ >> + if (user_mode(regs)) >> + show_exception_mesg_u(regs); >> + else >> + show_exception_mesg_k(regs); >> +} >> + >> void show_regs(struct pt_regs *regs) >> { >> struct task_struct *tsk = current; >> @@ -182,15 +214,10 @@ void show_regs(struct pt_regs *regs) >> print_task_path_n_nm(tsk, buf); >> show_regs_print_info(KERN_INFO); >> >> - show_ecr_verbose(regs); >> - >> - pr_info("[EFA ]: 0x%08lx\n[BLINK ]: %pS\n[ERET ]: %pS\n", >> - current->thread.fault_address, >> + pr_info("[ECR ]: 0x%08lx\n[EFA ]: 0x%08lx\n[BLINK ]: %pS\n[ERET ]: %pS\n", >> + regs->event, current->thread.fault_address, >> (void *)regs->blink, (void *)regs->ret); >> >> - if (user_mode(regs)) >> - show_faulting_vma(regs->ret, buf); /* faulting code, not data */ >> - >> pr_info("[STAT32]: 0x%08lx", regs->status32); >> >> #define STS_BIT(r, bit) r->status32 & STATUS_##bit##_MASK ? #bit" " : "" >> @@ -233,6 +260,8 @@ void show_kernel_fault_diag(const char *str, struct pt_regs *regs, >> /* Show fault description */ >> pr_info("\n%s\n", str); >> >> + show_exception_mesg(regs); >> + >> /* Caller and Callee regs */ >> show_regs(regs); >> >> diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c >> index b884bbd6f354..d9c6958a1702 100644 >> --- a/arch/arc/mm/fault.c >> +++ b/arch/arc/mm/fault.c >> @@ -202,6 +202,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs) >> /* info.si_code has been set above */ >> info.si_addr = (void __user *)address; >> force_sig_info(SIGSEGV, &info, tsk); >> + show_exception_mesg(regs); >> return; >> } >> >> @@ -241,4 +242,5 @@ void do_page_fault(unsigned long address, struct pt_regs *regs) >> info.si_code = BUS_ADRERR; >> info.si_addr = (void __user *)address; >> force_sig_info(SIGBUS, &info, tsk); >> + show_exception_mesg(regs); >> }