I've been pursuing a bug in a virtual machine (KVM) that I would like to share in here. The VM gets stuck when running perf in a VM and getting soft lockups. The bug happens upstream (Linux 6.6-rc4 - 8a749fd1a8720d461). The same kernel is being used in the host and in the guest. The problem only happens in a very specific circumstances: 1) PMU needs to be enabled in the guest 2) Libvirt/QEMU needs to use a custom CPU: * Here is the qemu line: -cpu Skylake-Server,kvm-pv-eoi=on,pmu=on * Any other CPU seems to hit the problem * Even using Skylake-Server on a Skylake server * Using CPU passthrough workaround the problem 3) You need to use 6 or more events in perf. * This is a line that reproduces the problem: # perf stat -e cpu-clock -e context-switches -e cpu-migrations -e page-faults -e cycles -e instructions -e branches ls * Removing any of these events (totaling 5 events) makes `perf` work again 4) This problem happens on upstream, 6.4 and 5.19 * This problem doesn't seem to happen on 5.12 Problem ======== When running perf in the circumstances above, the VM is stuck, with a lot of stack traces. This is some messages: kernel:[ 400.314381] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [kworker/u68:11:6853] kernel:[ 400.324380] watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [dynoKernelMon:9781] kernel:[ 404.368380] watchdog: BUG: soft lockup - CPU#30 stuck for 22s! [kworker/30:2:1326] Here is part of the stack. The full stack is in the pastebin below: nmi_cpu_backtrace (lib/nmi_backtrace.c:115) nmi_cpu_backtrace_handler (arch/x86/kernel/apic/hw_nmi.c:47) nmi_handle (arch/x86/kernel/nmi.c:149) __intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239) __intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239) default_do_nmi (arch/x86/kernel/nmi.c:347) exc_nmi (arch/x86/kernel/nmi.c:543) end_repeat_nmi (arch/x86/entry/entry_64.S:1471) __intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239) __intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239) __intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239) More info ========= Soft lockup messages in the guest: https://paste.debian.net/1293888/ Full log from the guest: https://paste.debian.net/1293891/ vCPU stacks dumped from the host (cat /proc/<vcpu>/stack): https://paste.debian.net/1293887/ Qemu (version 7.1.0) command line https://paste.debian.net/1293894/