On Thu, Oct 24, 2024 at 11:23:17PM +0100, John Garry wrote: > On 24/10/2024 22:13, Dave Chinner wrote: > > > > BTW, can you please share logs which would contain full stacktraces that > > > > this softlockup reports produce? The attached dmesg is just from fresh > > > > boot... Thanks! > > > > > > > thanks for getting back to me. > > > > > > So I think that enabling /proc/sys/kernel/softlockup_all_cpu_backtrace is > > > required there. Unfortunately my VM often just locks up without any sign of > > > life. > > Attach a "serial" console to the vm - add "console=ttyS0,115600" to > > the kernel command line and add "-serial pty" to the qemu command > > line. You can then attach something like minicom to the /dev/pts/X > > device that qemu creates for the console output and capture > > everything from initial boot right through to the softlockup traces > > that are emitted... > > I am using an OCI instance, so I can't change the qemu command line (as far > as I know). > > For this issue, the Cloud Shell locks up also. There are other console > connection methods, which I can try. > > BTW, earlier today I got this once when trying to recreate this issue: > > [ 1549.241972] ------------[ cut here ]------------ > [ 1609.240236] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 1609.240243] rcu: 5-...!: (0 ticks this GP) > idle=a8f4/1/0x4000000000000000 softirq=71287/71287 fqs=1 > [ 1609.240249] rcu: (detected by 2, t=60004 jiffies, g=168077, q=10823 > ncpus=16) > [ 1609.240252] Sending NMI from CPU 2 to CPUs 5: > [ 1609.240277] NMI backtrace for cpu 5 > [ 1609.240281] CPU: 5 UID: 1002 PID: 8250 Comm: mysqld Tainted: G W > 6.12.0-rc4-g556c97f2ecbf #40 > [ 1609.240286] Tainted: [W]=WARN > [ 1609.240288] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.5.1 06/16/2021 > [ 1609.240289] RIP: 0010:native_halt+0xe/0x20 > [ 1609.240296] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90 > 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 23 f1 17 01 f4 <e9> > 28 c3 05 01 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 > [ 1609.240298] RSP: 0018:ffffc0c8c71dbd20 EFLAGS: 00000046 > [ 1609.240301] RAX: 0000000000000003 RBX: ffff9ff73fab6580 RCX: > 0000000000000008 > [ 1609.240303] RDX: ffff9ff7bffaf740 RSI: 0000000000000003 RDI: > ffff9ff73fab6580 > [ 1609.240304] RBP: ffff9ff73f8b7440 R08: 0000000000000008 R09: > 0000000000000074 > [ 1609.240306] R10: 0000000000000002 R11: 0000000000000000 R12: > 0000000000000000 > [ 1609.240307] R13: 0000000000000001 R14: 0000000000000100 R15: > 0000000000180000 > [ 1609.240311] FS: 00007f9e12600700(0000) GS:ffff9ff73f880000(0000) > knlGS:0000000000000000 > [ 1609.240313] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1609.240315] CR2: 00007f9d63e00004 CR3: 0000001a0bc04005 CR4: > 0000000000770ef0 > [ 1609.240319] PKRU: 55555554 > [ 1609.240320] Call Trace: > [ 1609.240322] <NMI> > [ 1609.240325] ? nmi_cpu_backtrace+0x98/0x110 > [ 1609.240330] ? nmi_cpu_backtrace_handler+0x11/0x20 > [ 1609.240334] ? nmi_handle+0x5c/0x150 > [ 1609.240339] ? default_do_nmi+0x4e/0x120 > [ 1609.240343] ? exc_nmi+0x137/0x1d0 > [ 1609.240347] ? end_repeat_nmi+0xf/0x53 > [ 1609.240354] ? native_halt+0xe/0x20 > [ 1609.240357] ? native_halt+0xe/0x20 > [ 1609.240360] ? native_halt+0xe/0x20 > [ 1609.240363] </NMI> > [ 1609.240364] <TASK> > [ 1609.240366] kvm_wait+0x47/0x60 > [ 1609.240368] __pv_queued_spin_lock_slowpath+0x255/0x370 > [ 1609.240373] _raw_spin_lock+0x29/0x30 > [ 1609.240376] raw_spin_rq_lock_nested+0x1c/0x80 > [ 1609.240381] __task_rq_lock+0x3f/0xe0 > [ 1609.240384] try_to_wake_up+0x3cf/0x640 > [ 1609.240387] ? plist_del+0x63/0xc0 > [ 1609.240391] wake_up_q+0x4d/0x90 > [ 1609.240394] futex_wake+0x154/0x180 > [ 1609.240400] do_futex+0xf8/0x1d0 > [ 1609.240404] __x64_sys_futex+0x68/0x1c0 > [ 1609.240407] ? restore_fpregs_from_fpstate+0x3c/0xa0 > [ 1609.240411] do_syscall_64+0x62/0x170 > [ 1609.240416] entry_SYSCALL_64_after_hwframe+0x76/0x7e Yup, I'm seeing random RCU stalls as well when running a 64p VM under hard concurrent fstests load. The serial console output is occasionally tripping RCU stall warnings, too. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx