Re: NMI Reported with console_blast.sh

Petr Mladek <pmladek@xxxxxxxx> · Thu, 29 Feb 2024 12:15:28 +0100

On Thu 2024-02-22 00:21:19, John B. Wyatt IV wrote:
> On Mon, Feb 12, 2024 at 04:23:04PM -0500, John B. Wyatt IV wrote:
> > 
> > Red Hat's conservers are having an issue with the machine I was testing
> > on. It may take me a while to get back to you with more test results.
> > 
> 
> Found a work-around with conserver. I was able to follow up on the printk-caller
> info you requested.
> 
> I found 2 additional NMIs for a total of 3. Number 2 is very
> large-please feel free to let me know what specific information you
> wanted if it was unnecessary.
> 
> This first one (the original) is with the real-time profile (no preemption).
> The next two are with the performance-througput (no preemption).
> 
> This is what real-time NMI reports with the caller information enabled. It
> looks like it is lacking some information compared to the other two further below.
> 
> [ T2481] Call Trace:
> [ T2477] Kernel panic - not syncing: sysrq triggered crash
> [    C0] NMI backtrace for cpu 0

This message seems to be printed by nmi_cpu_backtrace().

I am surprised. I would expect to see the backtrace printed from panic().
It calls dump_stack() directly on the panic-CPU. And this panic() should
be called from sysrq_handle_crash(). IMHO, it should be (normal)
interrupt context.

Is it different on RT?

Or have you somehow modified the kernel to use NMI for sysrq?

> [    C0] Hardware name: Intel Corporation D50DNP1SBB/D50DNP1SBB, BIOS SE5C7411.86B.9409.D04.2212261349 12/26/2022
> [ C0] RIP: 0010:io_serial_in (arch/x86/kernel/early_printk.c:100)

I guess that this RIP is from the normal context.
It looks like the NMI interrupted an early_printk() call.

> [ C0] Code: cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 0f b6 8f c1 00 00 00 0f b7 57 08 d3 e6 01 f2 ec <0f> b6 c0 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90
> All code
> ========
>    0:	cc                   	int3
>    1:	0f 1f 40 00          	nopl   0x0(%rax)
>    5:	90                   	nop
>    6:	90                   	nop
>    7:	90                   	nop
>    8:	90                   	nop
>    9:	90                   	nop
>    a:	90                   	nop
>    b:	90                   	nop
>    c:	90                   	nop
>    d:	90                   	nop
>    e:	90                   	nop
>    f:	90                   	nop
>   10:	90                   	nop
>   11:	90                   	nop
>   12:	90                   	nop
>   13:	90                   	nop
>   14:	90                   	nop
>   15:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>   1a:	0f b6 8f c1 00 00 00 	movzbl 0xc1(%rdi),%ecx
>   21:	0f b7 57 08          	movzwl 0x8(%rdi),%edx
>   25:	d3 e6                	shl    %cl,%esi
>   27:	01 f2                	add    %esi,%edx
>   29:	ec                   	in     (%dx),%al
>   2a:*	0f b6 c0             	movzbl %al,%eax		<-- trapping instruction
>   2d:	c3                   	ret
>   2e:	cc                   	int3
>   2f:	cc                   	int3
>   30:	cc                   	int3
>   31:	cc                   	int3
>   32:	0f 1f 00             	nopl   (%rax)
>   35:	90                   	nop
>   36:	90                   	nop
>   37:	90                   	nop
>   38:	90                   	nop
>   39:	90                   	nop
>   3a:	90                   	nop
>   3b:	90                   	nop
>   3c:	90                   	nop
>   3d:	90                   	nop
>   3e:	90                   	nop
>   3f:	90                   	nop
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	0f b6 c0             	movzbl %al,%eax
>    3:	c3                   	ret
>    4:	cc                   	int3
>    5:	cc                   	int3
>    6:	cc                   	int3
>    7:	cc                   	int3
>    8:	0f 1f 00             	nopl   (%rax)
>    b:	90                   	nop
>    c:	90                   	nop
>    d:	90                   	nop
>    e:	90                   	nop
>    f:	90                   	nop
>   10:	90                   	nop
>   11:	90                   	nop
>   12:	90                   	nop
>   13:	90                   	nop
>   14:	90                   	nop
>   15:	90                   	nop
> [    C0] RSP: 0018:ff4507d95ec1fac0 EFLAGS: 00000002
> [    C0] RAX: ffffffffa13d6f00 RBX: 0000000000000000 RCX: 0000000000000000
> [    C0] RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffffa3ad49c0
> [    C0] RBP: 000000000000270f R08: ffffffffa39c73b3 R09: ff4507d95ec1fa1f
> [    C0] R10: 0000000031383432 R11: 0000000038343254 R12: 0000000000000020
> [    C0] R13: ffffffffa3ad49c0 R14: 0000000000000000 R15: 0000000000000000
> [    C0] FS:  00007fd6cce81740(0000) GS:ff1ef00bbf800000(0000) knlGS:0000000000000000
> [    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    C0] CR2: 00007fd6cd05d650 CR3: 000000208e5bc004 CR4: 0000000000771ef0
> [    C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    C0] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> [    C0] PKRU: 55555554
> [    C0] Call Trace:
> [    C0]  <NMI>
> [ C0] ? nmi_cpu_backtrace (lib/nmi_backtrace.c:115) 
> [ C0] ? nmi_cpu_backtrace_handler (arch/x86/kernel/apic/hw_nmi.c:47 (discriminator 1)) 
> [ C0] ? nmi_handle (arch/x86/kernel/nmi.c:150) 
> [ C0] ? io_serial_in (arch/x86/kernel/early_printk.c:100) 
> [ C0] ? default_do_nmi (arch/x86/kernel/nmi.c:351) 
> [ C0] ? exc_nmi (arch/x86/kernel/nmi.c:545) 
> [ C0] ? end_repeat_nmi (arch/x86/entry/entry_64.S:1394) 
> [ C0] ? mem_serial_out (drivers/tty/serial/8250/8250_port.c:368) 
> [ C0] ? io_serial_in (arch/x86/kernel/early_printk.c:100) 
> [ C0] ? io_serial_in (arch/x86/kernel/early_printk.c:100) 
> [ C0] ? io_serial_in (arch/x86/kernel/early_printk.c:100) 
> [    C0]  </NMI>

I think that <NMI> and </NMI> mean that the backtrace is from
the NMI stack.

Note that all entries are prefixed by '?'. It means that they
were found on the stack but they are not part of the current
call chain.

If the above is true then I miss the backtrace from the normal
context stack which has been interrupted by the NMI.

> [    C0]  <TASK>
> 
> ------------------------------------
> 
> Compared to the two NMIs with throughput-performance (no preemption)
> 
>   <NMI>
>   cpus=0
>    .runnable_avg                  : 3072
>  kthread (kernel/kthread.c:388) 
>    .util_est_enqueued             : 0
>   stack:0     pid:1733  tgid:1733  ppid:2      flags:0x00004000
>    .min_vruntime                  : 2084315.290254
>    .removed.load_avg              : 0
>    .avg_vruntime                  : 2084315.290254
>  console_blast.s  3497     34770.405603 N     34773.405603         3.000000     34764.898340         4   120 
>    .util_avg                      : 1024
>    .util_avg                      : 1024

It looks like messages from more (many) CPUs are mixed. I guess that they
are printed by print_cfs_rq(). But the order looks random.

Also I wonder why it is printed from NMI context. Maybe, it is from
some perf event, similar to hardlockup detector?

[...]

>    .se->avg.load_avg              : 7
>    .rt_throttled                  : 0
>  ? do_syscall_64 (arch/x86/entry/common.c:102) 
>   </NMI>
> 
> ---------------------------------
> 
>  <NMI>
>  dump_stack_lvl (lib/dump_stack.c:107) 
>  panic (kernel/panic.c:344) 
>  nmi_panic (kernel/panic.c:203) 
>  watchdog_hardlockup_check (kernel/watchdog.c:180) 
>  __perf_event_overflow (kernel/events/core.c:9612 (discriminator 2)) 
>  handle_pmi_common (arch/x86/events/intel/core.c:3052 (discriminator 1)) 
>  ? set_pte_vaddr_p4d (arch/x86/mm/init_64.c:307 arch/x86/mm/init_64.c:315) 
>  ? flush_tlb_one_kernel (./arch/x86/include/asm/paravirt.h:81 arch/x86/mm/tlb.c:1171 arch/x86/mm/tlb.c:1126) 
>  ? native_set_fixmap (arch/x86/mm/pgtable.c:679 arch/x86/mm/pgtable.c:688) 
>  ? ghes_copy_tofrom_phys (drivers/acpi/apei/ghes.c:330) 
>  intel_pmu_handle_irq (./arch/x86/include/asm/msr.h:84 ./arch/x86/include/asm/msr.h:118 arch/x86/events/intel/core.c:2427 arch/x86/events/intel/core.c:3118) 
>  perf_event_nmi_handler (arch/x86/events/core.c:1743 arch/x86/events/core.c:1729) 
>  ? native_queued_spin_lock_slowpath (./arch/x86/include/asm/vdso/processor.h:13 ./arch/x86/include/asm/vdso/processor.h:18 kernel/locking/qspinlock.c:383) 
>  nmi_handle (arch/x86/kernel/nmi.c:150) 
>  ? native_queued_spin_lock_slowpath (./arch/x86/include/asm/vdso/processor.h:13 ./arch/x86/include/asm/vdso/processor.h:18 kernel/locking/qspinlock.c:383) 
>  default_do_nmi (arch/x86/kernel/nmi.c:351) 
>  exc_nmi (arch/x86/kernel/nmi.c:545) 
>  end_repeat_nmi (arch/x86/entry/entry_64.S:1394) 

This actually seems to be from perf_event() used by the hardlockup
detector. It triggers NMI.

Best Regards,
Petr