https://bugzilla.kernel.org/show_bug.cgi?id=209867 Bug ID: 209867 Summary: CPU soft lockup/stall with nested KVM and SMP Product: Virtualization Version: unspecified Kernel Version: 5.9.1-arch1-1 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: kvm Assignee: virtualization_kvm@xxxxxxxxxxxxxxxxxxxx Reporter: frantisek@xxxxxxxxx Regression: No Hello, During my systemd CI adventures I've encountered an issue with kernel 5.9.x where the boot freezes at completely random moments because of a CPU soft lockup. From my testing it seems to be reproducible with nested KVM & SMP > 1 (it does happen with SMP == 1 as well, but not always) - see[0]. Reproducer is quite straightforward - enable nested KVM on the host, create a VM, and create a nested KVM VM in that VM. During my testing I used Vagrant[1] (with libvirt backend) for the outer VM, and an image generated by mkosi[2] for the inner VM. Both VMs run the same kernel version. Hosts: * several AMD & Intel servers with RHEL 8.2 (4.18.0-193.19.1.el8_2) * AMD desktop with Fedora 32 (5.6.2-300.fc32.x86_64) The behavior was consistent on all hosts. Desktop results: # qemu-system-x86_64 -net none -smp 2 -m 512 -nographic -machine accel=kvm -enable-kvm -cpu host -kernel /boot/vmlinuz-linux -initrd /boot/initramfs-linux.img -append 'debug rw console=ttyS0 root=/dev/sda1' -drive format=raw,file=image.raw ... [ 4.602193] random: dbus-daemon: uninitialized urandom read (12 bytes read) [ 5.538763] random: crng init done [ 28.635398] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd:1] [ 28.638215] Modules linked in: drm agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi crc32_pclmul crc32c_intel serio_raw atkbd libps2 aesni_intel glue_helper crypto_simd cryptd ata_piix floppy i8042 serio [ 28.642668] CPU: 2 PID: 1 Comm: systemd Not tainted 5.9.1-arch1-1 #1 [ 28.648865] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014 [ 28.648865] RIP: 0010:smp_call_function_many_cond+0x2a3/0x2f0 [ 28.655420] Code: c3 0d 3d 00 3b 05 61 0a 83 01 89 c7 0f 83 f4 fd ff ff 48 63 c7 49 8b 55 00 48 03 14 c5 00 19 81 8b 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75 f7 eb c9 48 c7 c2 60 45 d7 8b 48 89 ee 44 89 ff [ 28.655420] RSP: 0018:ffffacf800013b18 EFLAGS: 00000202 [ 28.668750] RAX: 0000000000000011 RBX: 0000000000000000 RCX: 0000000000000000 [ 28.668750] RDX: ffff91c39da333e0 RSI: 0000000000000000 RDI: 0000000000000000 [ 28.668750] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 [ 28.668750] R10: 0000000000000140 R11: 0000000000000002 R12: 0000000000000000 [ 28.682088] R13: ffff91c39db2d340 R14: 0000000000000140 R15: ffff91c39db2d348 [ 28.682088] FS: 00007fd4cbc04340(0000) GS:ffff91c39db00000(0000) knlGS:0000000000000000 [ 28.682088] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 28.682088] CR2: 00007fd4cc9e8520 CR3: 000000001fa8e000 CR4: 0000000000350ee0 [ 28.695419] Call Trace: [ 28.695419] ? __flush_tlb_all+0x30/0x30 [ 28.695419] ? __flush_tlb_all+0x30/0x30 [ 28.695419] on_each_cpu+0x43/0xb0 [ 28.695419] __purge_vmap_area_lazy+0x5d/0x670 [ 28.695419] ? do_jit+0xbdf/0x1cd0 [ 28.708758] ? purge_fragmented_blocks+0xbd/0x1a0 [ 28.708758] _vm_unmap_aliases.part.0+0x110/0x140 [ 28.708758] change_page_attr_set_clr+0xb9/0x1c0 [ 28.708758] set_memory_ro+0x26/0x30 [ 28.708758] bpf_int_jit_compile+0x407/0x42b [ 28.708758] bpf_prog_select_runtime+0x101/0x1a0 [ 28.708758] bpf_prog_load+0x49a/0x8e0 [ 28.722089] __do_sys_bpf+0x2dd/0x1ea0 [ 28.722089] do_syscall_64+0x33/0x40 [ 28.722089] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 28.722089] RIP: 0033:0x7fd4cc91ed5d [ 28.722089] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 70 0c 00 f7 d8 64 89 01 48 [ 28.735427] RSP: 002b:00007ffe18ec4218 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 [ 28.735427] RAX: ffffffffffffffda RBX: 000055cea99a9ab0 RCX: 00007fd4cc91ed5d [ 28.735427] RDX: 0000000000000070 RSI: 00007ffe18ec4220 RDI: 0000000000000005 [ 28.748752] RBP: 0000000000000000 R08: 0070756f7267632f R09: 0000000800000008 [ 28.748752] R10: 0000000000000000 R11: 0000000000000246 R12: 000055cea999fb20 [ 28.748752] R13: 0000000000000001 R14: 0000000000000001 R15: 000055cea99836a0 [ 56.635397] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd:1] [ 56.638254] Modules linked in: drm agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi crc32_pclmul crc32c_intel serio_raw atkbd libps2 aesni_intel glue_helper crypto_simd cryptd ata_piix floppy i8042 serio [ 56.642094] CPU: 2 PID: 1 Comm: systemd Tainted: G L 5.9.1-arch1-1 #1 [ 56.648798] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014 [ 56.648798] RIP: 0010:smp_call_function_many_cond+0x2a3/0x2f0 [ 56.655444] Code: c3 0d 3d 00 3b 05 61 0a 83 01 89 c7 0f 83 f4 fd ff ff 48 63 c7 49 8b 55 00 48 03 14 c5 00 19 81 8b 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75 f7 eb c9 48 c7 c2 60 45 d7 8b 48 89 ee 44 89 ff [ 56.655444] RSP: 0018:ffffacf800013b18 EFLAGS: 00000202 [ 56.655444] RAX: 0000000000000011 RBX: 0000000000000000 RCX: 0000000000000000 [ 56.668871] RDX: ffff91c39da333e0 RSI: 0000000000000000 RDI: 0000000000000000 [ 56.668871] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 [ 56.668871] R10: 0000000000000140 R11: 0000000000000002 R12: 0000000000000000 [ 56.668871] R13: ffff91c39db2d340 R14: 0000000000000140 R15: ffff91c39db2d348 [ 56.668871] FS: 00007fd4cbc04340(0000) GS:ffff91c39db00000(0000) knlGS:0000000000000000 [ 56.682244] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 56.682244] CR2: 00007fd4cc9e8520 CR3: 000000001fa8e000 CR4: 0000000000350ee0 [ 56.682244] Call Trace: [ 56.682244] ? __flush_tlb_all+0x30/0x30 [ 56.682244] ? __flush_tlb_all+0x30/0x30 [ 56.682244] on_each_cpu+0x43/0xb0 [ 56.682244] __purge_vmap_area_lazy+0x5d/0x670 [ 56.695525] ? do_jit+0xbdf/0x1cd0 [ 56.695525] ? purge_fragmented_blocks+0xbd/0x1a0 [ 56.695525] _vm_unmap_aliases.part.0+0x110/0x140 [ 56.695525] change_page_attr_set_clr+0xb9/0x1c0 [ 56.695525] set_memory_ro+0x26/0x30 [ 56.695525] bpf_int_jit_compile+0x407/0x42b [ 56.695525] bpf_prog_select_runtime+0x101/0x1a0 [ 56.708855] bpf_prog_load+0x49a/0x8e0 [ 56.708855] __do_sys_bpf+0x2dd/0x1ea0 [ 56.708855] do_syscall_64+0x33/0x40 [ 56.708855] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 56.708855] RIP: 0033:0x7fd4cc91ed5d [ 56.708855] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 70 0c 00 f7 d8 64 89 01 48 [ 56.722223] RSP: 002b:00007ffe18ec4218 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 [ 56.722223] RAX: ffffffffffffffda RBX: 000055cea99a9ab0 RCX: 00007fd4cc91ed5d [ 56.722223] RDX: 0000000000000070 RSI: 00007ffe18ec4220 RDI: 0000000000000005 [ 56.722223] RBP: 0000000000000000 R08: 0070756f7267632f R09: 0000000800000008 [ 56.735526] R10: 0000000000000000 R11: 0000000000000246 R12: 000055cea999fb20 [ 56.735526] R13: 0000000000000001 R14: 0000000000000001 R15: 000055cea99836a0 ... [ 64.578716] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 64.578716] (detected by 2, t=18002 jiffies, g=-207, q=1973) [ 64.578716] rcu: All QSes seen, last rcu_preempt kthread activity 18002 (4294896513-4294878511), jiffies_till_next_fqs=2, root ->qsmask 0x0 [ 64.578716] rcu: rcu_preempt kthread starved for 18002 jiffies! g-207 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=0 [ 64.588745] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 64.588745] rcu: RCU grace-period kthread stack dump: [ 64.588745] task:rcu_preempt state:R stack: 0 pid: 11 ppid: 2 flags:0x00004000 [ 64.602163] Call Trace: [ 64.602163] __schedule+0x292/0x830 [ 64.602163] schedule+0x46/0xf0 [ 64.602163] schedule_timeout+0x99/0x170 [ 64.602163] ? __next_timer_interrupt+0x100/0x100 [ 64.602163] rcu_gp_kthread+0x5a4/0xbe0 [ 64.602163] ? __note_gp_changes+0x190/0x190 [ 64.602163] kthread+0x142/0x160 [ 64.602163] ? __kthread_bind_mask+0x60/0x60 [ 64.615482] ret_from_fork+0x22/0x30 ... Server results: ... [ 32.051205] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] [ 32.051237] Modules linked in: [ 32.051237] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.1-arch1-1 #1 [ 32.051237] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014 [ 32.051237] RIP: 0010:smp_call_function_many_cond+0x2a3/0x2f0 [ 32.051237] Code: c3 0d 3d 00 3b 05 61 0a 83 01 89 c7 0f 83 f4 fd ff ff 48 63 c7 49 8b 55 00 48 03 14 c5 00 19 41 bb 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75 f7 eb c9 48 c7 c2 60 45 97 bb 48 89 ee 44 89 ff [ 32.051237] RSP: 0018:ffffa661c0013d98 EFLAGS: 00000202 [ 32.051237] RAX: 0000000000000011 RBX: 0000000000000000 RCX: 0000000000000004 [ 32.051237] RDX: ffff9e955db320a0 RSI: 0000000000000000 RDI: 0000000000000004 [ 32.051237] RBP: 0000000000000007 R08: 0000000000000000 R09: 0000000000000004 [ 32.051237] R10: 0000000000000005 R11: 0000000000000005 R12: 0000000000000000 [ 32.051237] R13: ffff9e955da2d340 R14: 0000000000000140 R15: ffff9e955da2d348 [ 32.051237] FS: 0000000000000000(0000) GS:ffff9e955da00000(0000) knlGS:0000000000000000 [ 32.051237] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.051237] CR2: 0000000000000000 CR3: 000000001840e000 CR4: 00000000000406f0 [ 32.051237] Call Trace: [ 32.051237] ? _raw_spin_unlock+0x16/0x30 [ 32.051237] ? text_poke_loc_init+0x160/0x160 [ 32.051237] ? text_poke_loc_init+0x160/0x160 [ 32.051237] on_each_cpu+0x43/0xb0 [ 32.051237] text_poke_bp_batch+0x1d7/0x200 [ 32.051237] text_poke_finish+0x1b/0x26 [ 32.051237] arch_jump_label_transform_apply+0x16/0x30 [ 32.051237] static_key_slow_inc_cpuslocked+0x7a/0x90 [ 32.051237] static_key_slow_inc+0x16/0x20 [ 32.051237] ? kvm_init_platform+0x16/0x16 [ 32.051237] activate_jump_labels+0x2f/0x32 [ 32.051237] do_one_initcall+0x59/0x234 [ 32.051237] kernel_init_freeable+0x1b0/0x1f5 [ 32.051237] ? rest_init+0xbf/0xbf [ 32.051237] kernel_init+0xa/0x111 [ 32.051237] ret_from_fork+0x22/0x30 ... Frankly, I'm at wits' end, as I've been noticing similar issues since kernel 5.8.x and still can pinpoint what's going on (again, see [0]), thus my aplogies if I filed this under a wrong component. Thank you. [0] https://github.com/systemd/systemd-centos-ci/pull/295#issuecomment-682519585 [1] Vagrant.configure("2") do |config| config.vm.box = "generic/arch" config.vm.provider :libvirt do |libvirt| libvirt.cpus = 4 libvirt.memory = "2048" libvirt.driver = "kvm" libvirt.nested = true libvirt.cpu_mode = "host-model" libvirt.random :model => "random" end end [2] # mkosi -b -d arch --qemu-headless -t gpt_ext4 -- You are receiving this mail because: You are watching the assignee of the bug.