On Fri, Mar 22, 2024 at 01:29:21PM +0100, Sebastian Andrzej Siewior wrote: > On 2024-03-15 15:21:19 [-0400], John B. Wyatt IV wrote: > The thing is that console_blast.sh does this "show a backtrace on all > CPUs, please" which triggers NMIs on all CPUs for backtrace. I can't > imagine how you did obtain the backtraces without an NMI. Unless the > tuned profile disables this somehow. I may have misapplied it. This is a very manual process right now due to conserver. > > ----------------------------- > > NMI Backtrace for 6.6.20-rt25 no forced preemption with tuned realtime profile > > ----------------------------- > > > > [ T2614] Kernel panic - not syncing: sysrq triggered crash > > [ C56] NMI backtrace for cpu 56 > > [ C56] Hardware name: Intel Corporation D50DNP1SBB/D50DNP1SBB, BIOS SE5C7411.86B.9409.D04.2212261349 12/26/2022 > > [ C56] RIP: 0010:io_serial_out (arch/x86/kernel/early_printk.c:105) > > [ C56] Code: 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 0f b6 8f c1 00 00 00 89 d0 0f b7 57 08 d3 e6 01 f2 ee <c3> cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 > > All code > > ======= > … > > 12: 90 nop > > 13: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > 18: 0f b6 8f c1 00 00 00 movzbl 0xc1(%rdi),%ecx > > 1f: 89 d0 mov %edx,%eax > > 21: 0f b7 57 08 movzwl 0x8(%rdi),%edx > > 25: d3 e6 shl %cl,%esi > > 27: 01 f2 add %esi,%edx > > 29: ee out %al,(%dx) > > 2a:* c3 ret <-- trapping instruction > > where is this output from? The `ret' opcode usually does not cause a > trap. My guess is that the machine has been interrupted by an external > user at this position. Just before the sysrq that crashes the system. [ T2618] ret_from_fork_asm (arch/x86/entry/entry_64.S:312) [ T2618] </TASK> [ T2618] task:kworker/185:0 state:I stack:0 pid:1131 ppid:2 flags:0x00004000 [ T2618] Workqueue: 0x0 (events) [ T2618] Call Trace: [ T2618] <TASK> [ T2618] __schedule (kernel/sched/core.c:5404 kernel/sched/core.c:6718) [ T2618] schedule (kernel/sched/core.c:6796 kernel/sched/core.c:6810) [ T2618] worker_thread (kernel/workqueue.c:2734) [ T2618] ? __pfx_worker_thread (kernel/workqueue.c:2727) [ T2618] kthread (kernel/kthread.c:388) [ T2618] ? __pfx_kthread (kernel/kthread.c:341) [ T2618] ret_from_fork (arch/x86/kernel/process.c:147) [ T2618] ? __pfx_kthread (kernel/kthread.c:341) [ T2618] ret_from_fork_asm (arch/x86/entry/entry_64.S:312) [ T2618] </TASK> [ T2614] sysrq: Trigger a crash [ T2618] task:kworker/185:0H state:I [ T2618] stack:0 pid:1132 ppid:2 flags:0x00004000 [ T2614] Kernel panic - not syncing: sysrq triggered crash [ C56] NMI backtrace for cpu 56 [ C56] Hardware name: Intel Corporation D50DNP1SBB/D50DNP1SBB, BIOS SE5C7411.86B.9409.D04.2212261349 12/26/2022 [ C56] RIP: 0010:io_serial_out (arch/x86/kernel/early_printk.c:105) [ C56] Code: 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 0f b6 8f c1 00 00 00 89 d0 0f b7 57 08 d3 e6 01 f2 ee <c3> cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 > Side note: This is using early_printk, correct? I believe so, but it might be preempted? This is the part it stopped in. static void io_serial_out(unsigned long addr, int offset, int value) { outb(value, addr + offset); } > … > > > [ C56] Call Trace: > > [ C56] <NMI> > > [ C56] ? nmi_cpu_backtrace (lib/nmi_backtrace.c:115) > > [ C56] ? nmi_cpu_backtrace_handler (arch/x86/kernel/apic/hw_nmi.c:47 (discriminator 1)) > > [ C56] ? nmi_handle (arch/x86/kernel/nmi.c:149) > > [ C56] ? io_serial_out (arch/x86/kernel/early_printk.c:105) > > [ C56] ? default_do_nmi (arch/x86/kernel/nmi.c:347) > > [ C56] ? exc_nmi (arch/x86/kernel/nmi.c:538) > > [ C56] ? end_repeat_nmi (arch/x86/entry/entry_64.S:1458) > > [ C56] ? io_serial_out (arch/x86/kernel/early_printk.c:105) > > [ C56] ? io_serial_out (arch/x86/kernel/early_printk.c:105) > > [ C56] ? io_serial_out (arch/x86/kernel/early_printk.c:105) > > [ C56] </NMI> > > This looks okay. The NMI did the backtrace as expected. > > > [ C56] <TASK> > > [ C56] serial8250_console_putchar (./include/linux/serial_core.h:704 drivers/tty/serial/8250/8250_port.c:3347) > > [ C56] ? __pfx_serial8250_console_putchar (drivers/tty/serial/8250/8250_port.c:3343) > > [ C56] uart_console_write (drivers/tty/serial/serial_core.c:2134) > > [ C56] serial8250_console_write_atomic (drivers/tty/serial/8250/8250_port.c:3628) > > [ C56] nbcon_emit_next_record (kernel/printk/nbcon.c:940) > > [ C56] __nbcon_atomic_flush_all (kernel/printk/nbcon.c:1192 (discriminator 1) kernel/printk/nbcon.c:1326 (discriminator 1)) > > [ C56] vprintk_emit (kernel/printk/printk.c:2414) > > [ C56] _printk (kernel/printk/printk.c:2474) > > [ C56] panic (./arch/x86/include/asm/bitops.h:207 ./arch/x86/include/asm/bitops.h:239 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/panic.c:528 kernel/panic.c:339) > > [ C56] ? _printk (kernel/printk/printk.c:2474) > > [ C56] sysrq_handle_crash (drivers/tty/sysrq.c:154) > > [ C56] __handle_sysrq (drivers/tty/sysrq.c:601) > > [ C56] write_sysrq_trigger (drivers/tty/sysrq.c:1165) > > [ C56] proc_reg_write (fs/proc/inode.c:340 fs/proc/inode.c:352) > > [ C56] ? preempt_count_add (./include/linux/ftrace.h:974 (discriminator 1) kernel/sched/core.c:5847 (discriminator 1) kernel/sched/core.c:5844 (discriminator 1) kernel/sched/core.c:5872 (discriminator 1)) > > [ C56] vfs_write (fs/read_write.c:582) > > [ C56] ksys_write (fs/read_write.c:637) > > [ C56] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:81) > > [ C56] ? do_dup2 (fs/file.c:1142) > > [ C56] ? syscall_exit_to_user_mode (kernel/entry/common.c:299) > > [ C56] ? do_syscall_64 (arch/x86/entry/common.c:88) > > [ C56] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:72 arch/x86/mm/fault.c:1513 arch/x86/mm/fault.c:1561) > > [ C56] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) > > According to this, someone issued a `crash' via sysrq. Why? > This is part of the console_blast.sh script that John Ogness sent me. Please see below: #!/bin/bash set -e # This script is intended to be started with no arguments. # # It then calls itself to create a pinned process for each CPU. Those # child processes will run in infinite loops of show-task-states via # /proc/sysrq-trigger. This generates lots of contention on the console. # # Finally, it waits 10 seconds before crashing the system. if [ $# -eq 0 ]; then cpus=$(($(nproc) - 1)) for i in $(seq 0 $cpus); do $0 $i & done sleep 15 echo c > /proc/sysrq-trigger exit 0 fi # Setup the CPU mask for the provided CPU number. The real taskset(1) # could simply use -c, but the busybox(1) variant does not support this. mask=$(printf "0x%x\n" $((1 << $1))) taskset -p $mask $$ # Wait a moment before beginning blast. sleep 2 while true; do echo t > /proc/sysrq-trigger done > > NMI Backtrace for 6.6.20-rt25 no forced preemption with tuned throughput-performance profile > > ----------------------------- > > This and the following backtrace shows the same picture: The CPU is > crashing due to proc/sysrq request and does CPU-backtraces via NMI and > polls in early_printk, waiting for the UART to become idle (probably). > > I don't see an issue here so far. Luis Goncalves discussed it with me after reading your response. Thank you for your help. The NMI was needed to flush the buffers upon the system crashing itself. Does this part about NMI watchdog need to be documented? -- Sincerly, John Wyatt Software Engineer, Core Kernel