On Tue, Nov 21, 2023 at 11:47:26PM +0800, Chengming Zhou wrote: > Ah yes, there is no NMI on ARM, so CPU 3 maybe running somewhere with > interrupts disabled. I searched the full log, but still haven't a clue. > And there is no any WARNING or BUG related to SLUB in the log. Yeah, nor anything else particularly. I tried turning on some debug options: CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_DETECT_HUNG_TASK=y CONFIG_WQ_WATCHDOG=y CONFIG_DEBUG_PREEMPT=y CONFIG_DEBUG_LOCKING=y CONFIG_DEBUG_ATOMIC_SLEEP=y https://validation.linaro.org/scheduler/job/4017828 which has some additional warnings related to clock changes but AFAICT those come from today's -next rather than the debug stuff: https://validation.linaro.org/scheduler/job/4017823 so that's not super helpful. > I wonder how to reproduce it locally with a Qemu VM since I don't have > the ARM machine. There's sample qemu jobs available from for example KernelCI: https://storage.kernelci.org/next/master/next-20231120/arm/multi_v7_defconfig/gcc-10/lab-baylibre/baseline-qemu_arm-virt-gicv3.html (includes the command line, though it's not using Debian testing like my test was). Note that I'm testing a bunch of platforms with the same kernel/rootfs combination and it was only the Raspberry Pi 3 which blew up. It is a bit tight for memory which might have some influence? I'm really suspecting this may have made some underlying platform bug more obvious :/
Attachment:
signature.asc
Description: PGP signature