On 2024-02-29 12:15:28 [+0100], Petr Mladek wrote: > > [ T2481] Call Trace: > > [ T2477] Kernel panic - not syncing: sysrq triggered crash > > [ C0] NMI backtrace for cpu 0 > > This message seems to be printed by nmi_cpu_backtrace(). > > I am surprised. I would expect to see the backtrace printed from panic(). > It calls dump_stack() directly on the panic-CPU. And this panic() should > be called from sysrq_handle_crash(). IMHO, it should be (normal) > interrupt context. > > Is it different on RT? No, it is all okay and the same. panic() was triggered from sysrq which is a threaded-IRQ and you don't see this. This triggered an NMI backtrace on CPU0 and it shows the NMI stack. After that, the TASK stack follows which could show panic() if the backtrace would show something. The stack trace for NMI is prefixed with ? so it is a guess from something on stack, not an actual unroll. And this part: > > 15: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > 1a: 0f b6 8f c1 00 00 00 movzbl 0xc1(%rdi),%ecx > > 21: 0f b7 57 08 movzwl 0x8(%rdi),%edx > > 25: d3 e6 shl %cl,%esi > > 27: 01 f2 add %esi,%edx > > 29: ec in (%dx),%al > > 2a:* 0f b6 c0 movzbl %al,%eax <-- trapping instruction > > 2d: c3 ret the error of some kind occurred here at 0x2a. But this instruction is innocent, it simply a zeros the upper 24bit of the register eax. So this isn't a crash, it is just where the CPUs at the time this was captured. > > <NMI> > > dump_stack_lvl (lib/dump_stack.c:107) > > panic (kernel/panic.c:344) > > nmi_panic (kernel/panic.c:203) > > watchdog_hardlockup_check (kernel/watchdog.c:180) > > __perf_event_overflow (kernel/events/core.c:9612 (discriminator 2)) > > handle_pmi_common (arch/x86/events/intel/core.c:3052 (discriminator 1)) > > ? set_pte_vaddr_p4d (arch/x86/mm/init_64.c:307 arch/x86/mm/init_64.c:315) > > ? flush_tlb_one_kernel (./arch/x86/include/asm/paravirt.h:81 arch/x86/mm/tlb.c:1171 arch/x86/mm/tlb.c:1126) > > ? native_set_fixmap (arch/x86/mm/pgtable.c:679 arch/x86/mm/pgtable.c:688) > > ? ghes_copy_tofrom_phys (drivers/acpi/apei/ghes.c:330) > > intel_pmu_handle_irq (./arch/x86/include/asm/msr.h:84 ./arch/x86/include/asm/msr.h:118 arch/x86/events/intel/core.c:2427 arch/x86/events/intel/core.c:3118) > > perf_event_nmi_handler (arch/x86/events/core.c:1743 arch/x86/events/core.c:1729) > > ? native_queued_spin_lock_slowpath (./arch/x86/include/asm/vdso/processor.h:13 ./arch/x86/include/asm/vdso/processor.h:18 kernel/locking/qspinlock.c:383) > > nmi_handle (arch/x86/kernel/nmi.c:150) > > ? native_queued_spin_lock_slowpath (./arch/x86/include/asm/vdso/processor.h:13 ./arch/x86/include/asm/vdso/processor.h:18 kernel/locking/qspinlock.c:383) > > default_do_nmi (arch/x86/kernel/nmi.c:351) > > exc_nmi (arch/x86/kernel/nmi.c:545) > > end_repeat_nmi (arch/x86/entry/entry_64.S:1394) > > This actually seems to be from perf_event() used by the hardlockup > detector. It triggers NMI. And the stack is properly unrolled and this the third backtrace. The second had <NMI> in it as suggesting this is from NMI but also stack traces from tasks due to sysrq-t. Now this last backtrace seems fine, and it appears that something tried to acquire a raw_spinlock_t, it took too long at which point the hardware watchdog triggered and all we got was this backtrace. Sadly we don't see the other side to get an idea on which lock it stuck. > Best Regards, > Petr Sebastian