On 9/8/22 17:24, František Šumšal wrote: <snip.>
Ya, unless there's a latent bug in KVM that's present in your L0 kernel, which is extremely unlikely given that the bug repros with 4.18 and 5.17 as the bare metal kernel, this isn't a KVM problem. The mm, ext4, and scheduler subsystems are all likely candidates. I'm not familiar enough with their gory details to point fingers though. Do you think it's possible to bisect the L1 kernel using the QEMU/TCG configuration? That'd probably be the least awful way to get a root cause.Yeah, I can try, but it might take some time. Nevertheless, it sounds like the least awful way how to debug this further, as you said. I'll report back if/when I find something interesting. Thanks for the tips!
Out of curiosity I tried to reproduce it on Fedora Rawhide (as both L0 and L1 with QEMU/TCG), which currently has 6.0.0-0.rc4.20220906git53e99dcff61e.32.fc38.x86_64, and hit another soft lockup during boot: ``` [ 228.311561] watchdog: BUG: soft lockup - CPU#0 stuck for 39s! [zram-generator:614] [ 228.312307] Modules linked in: fuse zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_blk serio_raw ata_generic pata_acpi qemu_fw_cfg [ 228.312307] irq event stamp: 17888 [ 228.324256] hardirqs last enabled at (17887): [<ffffffff95000f26>] asm_sysvec_call_function_single+0x16/0x20 [ 228.324619] hardirqs last disabled at (17888): [<ffffffff94f180ca>] sysvec_apic_timer_interrupt+0xa/0xc0 [ 228.333763] softirqs last enabled at (17878): [<ffffffff940ff749>] __irq_exit_rcu+0xf9/0x170 [ 228.334446] softirqs last disabled at (17867): [<ffffffff940ff749>] __irq_exit_rcu+0xf9/0x170 [ 228.337785] CPU: 0 PID: 614 Comm: zram-generator Not tainted 6.0.0-0.rc4.20220906git53e99dcff61e.32.fc38.x86_64 #1 [ 228.348634] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-2.fc37 04/01/2014 [ 228.349359] RIP: 0010:_raw_spin_unlock_irqrestore+0x36/0x60 [ 228.356249] Code: 74 24 10 48 89 fb 48 83 c7 18 e8 b5 0d 25 ff 48 89 df e8 8d 4c 25 ff f7 c5 00 02 00 00 74 0b e8 a0 ef 32 ff fb 0f 1f 44 00 00 <bf> 01 00 00 00 e8 a0 5f 20 ff 65 8b 05 79 43 0f 6b 85 c0 74 07 5b [ 228.363915] RSP: 0018:ff73ffef41403c28 EFLAGS: 00000206 [ 228.366966] RAX: 00000000000045db RBX: ff203b9ab2dfe140 RCX: 0000000000000006 [ 228.367334] RDX: 0000000000000000 RSI: ffffffff9597e547 RDI: ffffffff958ead16 [ 228.367334] RBP: 0000000000000282 R08: 0000000000000001 R09: 0000000000000001 [ 228.367334] R10: 0000000000000001 R11: 0000000000000000 R12: ff203b9ab2dfe140 [ 228.367334] R13: 0000000000000064 R14: ff203b9ab2dfe198 R15: ffbfc6fb04274888 [ 228.384777] FS: 00007f8c463af780(0000) GS:ff203b9ab2c00000(0000) knlGS:0000000000000000 [ 228.384777] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 228.384777] CR2: 0000556d5c18a098 CR3: 000000010bed6000 CR4: 0000000000751ef0 [ 228.384777] PKRU: 55555554 [ 228.384777] Call Trace: [ 228.384777] <TASK> [ 228.384777] __alloc_pages_bulk+0x3cb/0x730 [ 228.384777] __vmalloc_node_range+0x253/0x820 [ 228.384777] __vmalloc_node+0x4a/0x60 [ 228.384777] ? disksize_store+0x7a/0x140 [zram] [ 228.384777] disksize_store+0x7a/0x140 [zram] [ 228.384777] kernfs_fop_write_iter+0x161/0x210 [ 228.384777] vfs_write+0x22b/0x4b0 [ 228.384777] ksys_write+0x5c/0xe0 [ 228.384777] do_syscall_64+0x5b/0x80 [ 228.384777] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 228.384777] ? lockdep_hardirqs_on+0x7d/0x100 [ 228.384777] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 228.384777] RIP: 0033:0x7f8c465a91f4 [ 228.384777] [ 228.384777] ================================ [ 228.384777] WARNING: inconsistent lock state [ 228.384777] 6.0.0-0.rc4.20220906git53e99dcff61e.32.fc38.x86_64 #1 Not tainted [ 228.384777] -------------------------------- [ 228.384777] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. [ 228.384777] zram-generator/614 [HC1[1]:SC0[0]:HE0:SE1] takes: [ 228.384777] ffffffff9626f5d8 (vmap_area_lock){?.+.}-{2:2}, at: find_vmap_area+0x16/0x60 [ 228.384777] {HARDIRQ-ON-W} state was registered at: [ 228.384777] lock_acquire+0xce/0x2d0 [ 228.384777] _raw_spin_lock+0x33/0x80 [ 228.384777] alloc_vmap_area+0x3e6/0x890 [ 228.384777] __get_vm_area_node+0xb1/0x150 [ 228.384777] get_vm_area_caller+0x3a/0x50 [ 228.384777] __ioremap_caller+0x145/0x320 [ 228.384777] acpi_os_map_iomem+0x1c9/0x1e0 [ 228.384777] acpi_tb_acquire_table+0x39/0x62 [ 228.384777] acpi_tb_validate_table+0x44/0x7c [ 228.384777] acpi_tb_verify_temp_table+0x42/0x30a [ 228.384777] acpi_reallocate_root_table+0x129/0x143 [ 228.384777] acpi_early_init+0x4b/0xde [ 228.384777] start_kernel+0x8da/0x998 [ 228.384777] secondary_startup_64_no_verify+0xe5/0xeb [ 228.384777] irq event stamp: 17888 [ 228.384777] hardirqs last enabled at (17887): [<ffffffff95000f26>] asm_sysvec_call_function_single+0x16/0x20 [ 228.384777] hardirqs last disabled at (17888): [<ffffffff94f180ca>] sysvec_apic_timer_interrupt+0xa/0xc0 [ 228.384777] softirqs last enabled at (17878): [<ffffffff940ff749>] __irq_exit_rcu+0xf9/0x170 [ 228.384777] softirqs last disabled at (17867): [<ffffffff940ff749>] __irq_exit_rcu+0xf9/0x170 [ 228.384777] [ 228.384777] other info that might help us debug this: [ 228.384777] Possible unsafe locking scenario: [ 228.384777] [ 228.384777] CPU0 [ 228.384777] ---- [ 228.384777] lock(vmap_area_lock); [ 228.384777] <Interrupt> [ 228.384777] lock(vmap_area_lock); [ 228.384777] [ 228.384777] *** DEADLOCK *** [ 228.384777] [ 228.384777] 4 locks held by zram-generator/614: [ 228.384777] #0: ff203b9885d30498 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x5c/0xe0 [ 228.384777] #1: ff203b98839a9890 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x114/0x210 [ 228.384777] #2: ff203b988be40a58 (kn->active#90){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x11d/0x210 [ 228.384777] #3: ff203b988b8a7298 (&zram->init_lock){+.+.}-{3:3}, at: disksize_store+0x49/0x140 [zram] [ 228.384777] [ 228.384777] stack backtrace: [ 228.384777] CPU: 0 PID: 614 Comm: zram-generator Not tainted 6.0.0-0.rc4.20220906git53e99dcff61e.32.fc38.x86_64 #1 [ 228.384777] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-2.fc37 04/01/2014 [ 228.384777] Call Trace: [ 228.384777] <IRQ> [ 228.384777] dump_stack_lvl+0x5b/0x77 [ 228.384777] mark_lock.cold+0x48/0xe1 [ 228.384777] ? sched_clock_local+0xe/0x80 [ 228.384777] ? __lock_acquire+0x388/0x1ef0 [ 228.384777] ? sched_clock_local+0xe/0x80 [ 228.384777] __lock_acquire+0x948/0x1ef0 [ 228.384777] lock_acquire+0xce/0x2d0 [ 228.384777] ? find_vmap_area+0x16/0x60 [ 228.384777] ? lock_release+0x14f/0x460 [ 228.384777] _raw_spin_lock+0x33/0x80 [ 228.384777] ? find_vmap_area+0x16/0x60 [ 228.384777] find_vmap_area+0x16/0x60 [ 228.384777] __check_object_size+0x161/0x290 [ 228.384777] copy_from_user_nmi+0x61/0x90 [ 228.384777] show_opcodes+0x5d/0xc0 [ 228.384777] ? irq_work_queue+0xa/0x50 [ 228.384777] show_iret_regs+0x12/0x37 [ 228.384777] __show_regs+0x24/0x40 [ 228.384777] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 228.384777] ? unwind_next_frame+0x3a7/0x620 [ 228.384777] show_trace_log_lvl+0x2e0/0x32f [ 228.384777] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 228.384777] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 228.384777] watchdog_timer_fn+0x224/0x290 [ 228.384777] ? softlockup_fn+0x70/0x70 [ 228.384777] __hrtimer_run_queues+0x217/0x530 [ 228.384777] hrtimer_interrupt+0xfe/0x220 [ 228.384777] __sysvec_apic_timer_interrupt+0xa2/0x2b0 [ 228.384777] sysvec_apic_timer_interrupt+0x99/0xc0 [ 228.384777] </IRQ> [ 228.384777] <TASK> [ 228.384777] asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 228.384777] RIP: 0010:_raw_spin_unlock_irqrestore+0x36/0x60 [ 228.384777] Code: 74 24 10 48 89 fb 48 83 c7 18 e8 b5 0d 25 ff 48 89 df e8 8d 4c 25 ff f7 c5 00 02 00 00 74 0b e8 a0 ef 32 ff fb 0f 1f 44 00 00 <bf> 01 00 00 00 e8 a0 5f 20 ff 65 8b 05 79 43 0f 6b 85 c0 74 07 5b [ 228.384777] RSP: 0018:ff73ffef41403c28 EFLAGS: 00000206 [ 228.384777] RAX: 00000000000045db RBX: ff203b9ab2dfe140 RCX: 0000000000000006 [ 228.384777] RDX: 0000000000000000 RSI: ffffffff9597e547 RDI: ffffffff958ead16 [ 228.384777] RBP: 0000000000000282 R08: 0000000000000001 R09: 0000000000000001 [ 228.384777] R10: 0000000000000001 R11: 0000000000000000 R12: ff203b9ab2dfe140 [ 228.384777] R13: 0000000000000064 R14: ff203b9ab2dfe198 R15: ffbfc6fb04274888 [ 228.384777] ? _raw_spin_unlock_irqrestore+0x30/0x60 [ 228.384777] __alloc_pages_bulk+0x3cb/0x730 [ 228.384777] __vmalloc_node_range+0x253/0x820 [ 228.384777] __vmalloc_node+0x4a/0x60 [ 228.384777] ? disksize_store+0x7a/0x140 [zram] [ 228.384777] disksize_store+0x7a/0x140 [zram] [ 228.384777] kernfs_fop_write_iter+0x161/0x210 [ 228.384777] vfs_write+0x22b/0x4b0 [ 228.384777] ksys_write+0x5c/0xe0 [ 228.384777] do_syscall_64+0x5b/0x80 [ 228.384777] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 228.384777] ? lockdep_hardirqs_on+0x7d/0x100 [ 228.384777] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 228.384777] RIP: 0033:0x7f8c465a91f4 [ 228.384777] Code: 15 11 7c 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 80 3d ed 03 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48 [ 228.384777] RSP: 002b:00007fffa0d4b658 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 228.384777] RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f8c465a91f4 [ 228.384777] RDX: 000000000000000a RSI: 0000556d5c1866a0 RDI: 0000000000000004 [ 228.384777] RBP: 0000000000000019 R08: 0000556d5c1854d0 R09: fefefefefefefeff [ 228.384777] R10: 00000000000001b6 R11: 0000000000000202 R12: 0000556d5c1866a0 [ 228.384777] R13: 7fffffffffffffff R14: 0000000000000004 R15: 00007f8c465a91e0 [ 228.384777] </TASK> [ 228.384777] Code: 15 11 7c 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 80 3d ed 03 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48 [ 228.384777] RSP: 002b:00007fffa0d4b658 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 228.384777] RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f8c465a91f4 [ 228.384777] RDX: 000000000000000a RSI: 0000556d5c1866a0 RDI: 0000000000000004 [ 228.384777] RBP: 0000000000000019 R08: 0000556d5c1854d0 R09: fefefefefefefeff [ 228.384777] R10: 00000000000001b6 R11: 0000000000000202 R12: 0000556d5c1866a0 [ 228.384777] R13: 7fffffffffffffff R14: 0000000000000004 R15: 00007f8c465a91e0 ``` Unfortunately, given the stack trace I don't think it's related to the original issue, so the search continues. -- PGP Key ID: 0xFB738CE27B634E4B
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature