Hi Paolo, Alex, We hit a hard lockup bug in our tests while started VM with passthrough NIC. Unfortunately, we didn't get the available vmcore to dig into. And it is quite difficult To reproduce, actually, we only hit this problem once. Does anyone hit such problem ? Or any idea ? (For the complete log, please see the attachment). [93137.701139] ------------[ cut here ]------------ [93137.701146] WARNING: at kernel/workqueue.c:2131 process_one_work+0x2e8/0x470() [93137.701147] Modules linked in: ext4 jbd2 ixgbe(O) dev_connlimit(O) igb_uio(OE) uio bum(O) ip_set nfnetlink prio(O) nat(O) vport_vxlan(O) openvswitch(O) nf_defrag_ipv6 gre signo_catch(O) hotpatch(OE) kboxdriver(O) ipmi_devintf ipmi_si ipmi_msghandler kbox(O) mlx5_ib i40e(OE) ib_core ib_addr vxlan coretemp crc32_pclmul ip6_udp_tunnel crc32c_intel udp_tunnel ptp kvm_intel(O) ghash_clmulni_intel aesni_intel dca kvm(O) pps_core lrw gf128mul glue_helper sg ablk_helper cryptd pcspkr mlx5_core acpi_cpufreq shpchp acpi_power_meter acpi_pad remote_trigger(O) nf_conntrack_ipv4 nf_defrag_ipv4 vhost_net(O) tun(O) vhost(O) macvtap macvlan vfio_pci irqbypass vfio_iommu_type1 vfio xt_sctp nf_conntrack_proto_sctp nf_nat_proto_sctp nf_nat nf_conntrack sctp libcrc32c ip_tables ext3 mbcache jbd sd_mod crc_t10dif crct10dif_generic [93137.701171] crct10dif_pclmul crct10dif_common mpt3sas raid_class ahci scsi_transport_sas libahci libata dm_mod [last unloaded: ixgbe] [93137.701178] CPU: 1 PID: 119983 Comm: kworker/0:7 Tainted: G W OE ---- ------- 3.10.0-327.55.58.94_29.x86_64 #1 [93137.701179] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 0.19 06/22/2017 [93137.701183] 0000000000000000 000000005abbce18 ffff880225cb7dd0 ffffffff8164791f [93137.701185] ffff880225cb7e08 ffffffff8107b220 ffff88282066c4a8 ffff880868ed3c90 [93137.701186] ffff881ffec16080 ffff881ffec1c000 0000000000000000 ffff880225cb7e18 [93137.701188] ffffffff8107b36a ffff880225cb7e60 ffffffff8109de58 ffff881ffec16098 [93137.701189] ffff880868ed3c00 ffff881ffec16098 ffff880868ed3cc0 ffff880649b09980 [93137.701190] ffff880868ed3c90 ffff881ffec16080 ffff880225cb7ec0 ffffffff8109eabb [93137.701191] ffff880225cb7fd8 0000000000016840 ffff880649b09980 ffff880649b09980 [93137.701193] ffff880649b09980 ffff8809065bfd38 ffff880868ed3c90 ffffffff8109e9a0 [93137.701194] 0000000000000000 0000000000000000 ffff880225cb7f48 ffffffff810a61ff [93137.701195] 0000000000000000 ffff880225cb7f68 ffff880868ed3c90 ffff881f00000000 [93137.701196] ffff882900000000 ffff880225cb7ef8 ffff880225cb7ef8 ffff881f00000000 [93137.701198] ffff881f00000000 ffff880225cb7f18 ffff880225cb7f18 000000005abbce18 [93137.701199] ffffffff810a6130 0000000000000000 0000000000000000 ffff8809065bfd38 [93137.701200] ffffffff81657b98 0000000000000000 0000000000000000 0000000000000000 [93137.701202] 0000000000000000 ffff8809065bfd38 ffffffff810a6130 0000000000000000 [93137.701203] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [93137.701204] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [93137.701205] ffffffffffffffff 0000000000000000 0000000000000010 0000000000000202 [93137.701207] ffff880225cb7f58 0000000000000018 [93137.701208] Call Trace: [93137.701212] [<ffffffff8164791f>] dump_stack+0x19/0x1b [93137.701215] [<ffffffff8107b220>] warn_slowpath_common+0x70/0xb0 [93137.701216] [<ffffffff8107b36a>] warn_slowpath_null+0x1a/0x20 [93137.701218] [<ffffffff8109de58>] process_one_work+0x2e8/0x470 [93137.701219] [<ffffffff8109eabb>] worker_thread+0x11b/0x400 [93137.701221] [<ffffffff8109e9a0>] ? rescuer_thread+0x400/0x400 [93137.701223] [<ffffffff810a61ff>] kthread+0xcf/0xe0 [93137.701224] [<ffffffff810a6130>] ? kthread_create_on_node+0x140/0x140 [93137.701227] [<ffffffff81657b98>] ret_from_fork+0x58/0x90 [93137.701228] [<ffffffff810a6130>] ? kthread_create_on_node+0x140/0x140 [93137.701229] ---[ end trace 22f458263368a87f ]--- [93139.432279] ixgbe 0000:5e:00.1 eth7: VF Reset msg received from vf 9 [93139.442787] ixgbe 0000:5e:00.1 eth7: VF 9 requested invalid api version 3 [93139.444842] vfio-pci 0000:5e:12.3: irq 235 for MSI/MSI-X [93141.699745] vfio-pci 0000:5e:11.5: irq 237 for MSI/MSI-X [93141.699998] vfio-pci 0000:5e:11.5: irq 238 for MSI/MSI-X [93141.701442] vfio-pci 0000:5e:11.5: irq 237 for MSI/MSI-X [93141.701584] vfio-pci 0000:5e:11.5: irq 238 for MSI/MSI-X [93141.701720] vfio-pci 0000:5e:11.5: irq 242 for MSI/MSI-X [93142.279728] kvm [186763]: vcpu0 ignored wrmsr: 0x1c9 data 3 [93142.279736] kvm [186763]: vcpu0 ignored wrmsr: 0x1a6 data 11 [93142.279738] kvm [186763]: vcpu0 ignored wrmsr: 0x1a7 data 11 [93142.279740] kvm [186763]: vcpu0 ignored wrmsr: 0x3f6 data 11 [93142.279743] kvm [186763]: vcpu0 ignored wrmsr: 0x3f7 data 11 [93145.421969] ixgbe 0000:5e:00.1 eth7: VF Reset msg received from vf 20 [93145.432473] ixgbe 0000:5e:00.1 eth7: VF 20 requested invalid api version 3 [93145.434743] vfio-pci 0000:5e:15.1: irq 243 for MSI/MSI-X [93156.975002] vfio-pci 0000:5e:11.7: irq 233 for MSI/MSI-X [93156.975280] vfio-pci 0000:5e:11.7: irq 245 for MSI/MSI-X [93156.977146] vfio-pci 0000:5e:11.7: irq 233 for MSI/MSI-X [93156.977386] vfio-pci 0000:5e:11.7: irq 245 for MSI/MSI-X [93156.977654] vfio-pci 0000:5e:11.7: irq 256 for MSI/MSI-X [93160.455435] vfio-pci 0000:5e:12.1: irq 252 for MSI/MSI-X [93160.455710] vfio-pci 0000:5e:12.1: irq 262 for MSI/MSI-X [93160.457794] vfio-pci 0000:5e:12.1: irq 252 for MSI/MSI-X [93160.458043] vfio-pci 0000:5e:12.1: irq 262 for MSI/MSI-X [93160.458177] vfio-pci 0000:5e:12.1: irq 268 for MSI/MSI-X [93171.874151] vfio-pci 0000:5e:11.1: irq 230 for MSI/MSI-X [93171.874258] vfio-pci 0000:5e:11.1: irq 271 for MSI/MSI-X [93171.875976] vfio-pci 0000:5e:11.1: irq 230 for MSI/MSI-X [93171.876078] vfio-pci 0000:5e:11.1: irq 271 for MSI/MSI-X [93171.876154] vfio-pci 0000:5e:11.1: irq 272 for MSI/MSI-X [93172.575386] vfio-pci 0000:5e:12.3: irq 235 for MSI/MSI-X [93172.575518] vfio-pci 0000:5e:12.3: irq 280 for MSI/MSI-X [93172.577258] vfio-pci 0000:5e:12.3: irq 235 for MSI/MSI-X [93172.577379] vfio-pci 0000:5e:12.3: irq 280 for MSI/MSI-X [93172.577450] vfio-pci 0000:5e:12.3: irq 281 for MSI/MSI-X [93185.691658] vfio-pci 0000:5e:15.1: irq 243 for MSI/MSI-X [93185.691783] vfio-pci 0000:5e:15.1: irq 282 for MSI/MSI-X [93185.693707] vfio-pci 0000:5e:15.1: irq 243 for MSI/MSI-X [93185.693805] vfio-pci 0000:5e:15.1: irq 282 for MSI/MSI-X [93185.693875] vfio-pci 0000:5e:15.1: irq 283 for MSI/MSI-X [93599.715862] [sched_delayed] sched: RT throttling activated [93692.404740] !!kbox:catch rlock on cpu 16!starting process!! [93692.405008] [kbox] num_online_cpus: 63, cur_cpu: 16 [93692.405009] cpu 16 will send nmi, the cpumask: 0xfffffffffffefffe [93692.417396] smp_nmi_call_function is finished. wait: 1, time: -1, timeout: 996500, cpumask: 0x0 [93692.419042] kbox rlock cpu 16 :sending stop and flush finished [93698.011186] CPU: 16 PID: 154635 Comm: CPU 3/KVM Tainted: G W OE ---- ------- 3.10.0-327.55.58.94_29.x86_64 #1 [93698.011188] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 0.19 06/22/2017 [93698.011190] task: ffff88280451cc80 ti: ffff88292b160000 task.ti: ffff88292b160000 [93698.011216] RIP: 0010:[<ffffffffa0589ea6>] [<ffffffffa0589ea6>] vmx_vcpu_run+0x666/0x750 [kvm_intel] [93698.011217] RSP: 0018:ffff88292b163cf0 EFLAGS: 00000082 [93698.011218] RAX: 0000000080000202 RBX: ffff8825bffdb400 RCX: ffff8825bffdb400 [93698.011218] RDX: 0000000000000200 RSI: ffff8825bffdb400 RDI: ffff8825bffdb400 [93698.011218] RBP: ffff88292b163d50 R08: ffff7fffffffffff R09: 0000000000000000 [93698.011219] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [93698.011220] R13: 00007f7202374808 R14: 00007f71c2375000 R15: 000000000000007f [93698.011220] FS: 00007ff6f48d7700(0000) GS:ffff883ffca00000(0000) knlGS:ffff88013fcc0000 [93698.011221] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [93698.011221] CR2: 00007f3903526000 CR3: 0000002808a78000 CR4: 00000000003427e0 [93698.011222] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [93698.011222] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [93698.011222] Stack: [93698.011225] 0000000000000000 ffff8825bffdb400 0000000000000000 ffffffff816585c0 [93698.011226] 00029516f44e1ac2 ffff8825bffdb400 00000000dde5f3e3 ffff8825bffdb400 [93698.011226] 0000000000000000 0000000000000003 ffffffffa05855f0 0000000000000010 [93698.011227] ffff88292b163dd0 ffffffffa03ea569 000000ef00000000 ffff8825bfa62000 [93698.011227] ffff8825c2b2f2d8 ffff88292b163fd8 ffff88280451cc80 ffff88292cae0048 [93698.011228] 0000000000000001 00000000dde5f3e3 ffffffffa04103b5 ffff8825bffdb400 [93698.011229] ffff88292b163fd8 ffff88280451cc80 ffff88292cae0048 0000000000000001 [93698.011229] ffff88292b163e18 ffffffffa03f2285 ffffffee7ffbfaff 00000000dde5f3e3 [93698.011230] ffff8825bffdb400 ffff8828115bef40 0000000000000000 ffff88279d5be680 [93698.011231] ffff88280451cc80 ffff88292b163eb0 ffffffffa03d9b51 0000000000000000 [93698.011231] 0000000100000e40 0000000000000001 ffff882800f19b90 ffff88280451cc80 [93698.011232] ffff882800f19b98 ffff882800f19ba0 0000000000000000 0000000000000000 [93698.011233] 0000000000000001 ffff8827ff05bc10 00000000dde5f3e3 ffff88279d5be680 [93698.011233] ffff883fcceff590 0000000000000000 0000000000000000 0000000000000001 [93698.011234] ffff88292b163f28 ffffffff811fdf65 00000000dde5f3e3 ffff8827ff05bc00 [93698.011234] 0000000000000008 0000000000000002 ffff88292b163f10 ffffffffa03e7dc5 [93698.011235] 0000000000000000 ffff88292b163f58 00000000dde5f3e3 ffff88279d5be680 [93698.011236] 000000000000001e 000000000000ae80 0000000000000000 ffff88292b163f78 [93698.011236] ffffffff811fe1e1 0000000000000000 0000000102d4d010 00000000dde5f3e3 [93698.011237] 0000000000000000 0000000002d4d010 000000000000ae80 000000000082b4e0 [93698.011238] 0000000000000000 00007ff6f48d69f0 ffffffff81657c49 0000000000000246 [93698.011238] 00007ff7012c7000 00000000000000ff 00000000008456f0 0000000000000010 [93698.011239] ffffffffffffffff 0000000000000000 000000000000ae80 000000000000001e [93698.011239] 0000000000000010 00007ff6fbdcb7b7 0000000000000033 0000000000000246 [93698.011240] Call Trace: [93698.011246] [<ffffffff816585c0>] ? irq_entries_start+0x300/0x400 [93698.011249] [<ffffffffa05855f0>] ? vmx_inject_irq+0xf0/0xf0 [kvm_intel] [93698.011289] [<ffffffffa03ea569>] vcpu_enter_guest+0x639/0x1160 [kvm] [93698.011306] [<ffffffffa04103b5>] ? kvm_apic_local_deliver+0x65/0x70 [kvm] [93698.011314] [<ffffffffa03f2285>] kvm_arch_vcpu_ioctl_run+0xd5/0x440 [kvm] [93698.011319] [<ffffffffa03d9b51>] kvm_vcpu_ioctl+0x2b1/0x640 [kvm] [93698.011322] [<ffffffff811fdf65>] do_vfs_ioctl+0x2e5/0x4c0 [93698.011328] [<ffffffffa03e7dc5>] ? kvm_on_user_return+0x75/0xb0 [kvm] [93698.011329] [<ffffffff811fe1e1>] SyS_ioctl+0xa1/0xc0 [93698.011331] [<ffffffff81657c49>] system_call_fastpath+0x16/0x1b [93698.011338] Code: 8b 80 3c 02 00 00 a8 10 0f 84 2f fa ff ff e9 12 ff ff ff 66 90 85 c0 0f 89 c1 fc ff ff 48 8b 5d a8 48 89 df e8 1c 2c e5 ff cd 02 <48> 89 df e8 32 2c e5 ff e9 a6 fc ff ff 49 89 f8 49 c1 e8 0d 41 [93698.011804] kbox:catch rlock!end process! [93698.014242] collected_len = 893933, LOG_BUF_LEN_LOCAL = 1048576 [93700.363926] !!kbox:catch rlock on cpu 25!starting process!! [93700.363926] rlock:another rlock on another cpu 25 [93700.363928] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 25 [93700.363933] CPU: 25 PID: 155622 Comm: CPU 3/KVM Tainted: G W OE ---- ------- 3.10.0-327.55.58.94_29.x86_64 #1 [93700.363934] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 0.19 06/22/2017 [93700.363936] ffffffff8187c808 000000001c178ae5 ffff883ffcc45ae8 ffffffff8164791f [93700.363937] ffff883ffcc45b68 ffffffff8164105f 0000000000000010 ffff883ffcc45b78 [93700.363938] ffff883ffcc45b18 000000001c178ae5 0000000000000000 0000000000000019 [93700.363938] 0000000000000003 0000000000000000 ffffffff81cc8bfc ffffffff81cff00d [93700.363939] 0000000000000019 0000000000000000 ffff883ffcc45c40 0000000000000000 [93700.363940] ffff883ffcc45b80 ffffffff8111e871 ffff883fd07944a8 ffff883ffcc45bf8 [93700.363940] ffffffff81162091 000000001c178ae5 ffff883fd07944a8 fffffff9f2462404 [93700.363941] 00000000f2462404 0000000000000001 ffff883ffcc4bbe0 ffff883ffcc45bf0 [93700.363941] 000000001c178ae5 0000000000000001 ffff883ffcc4b9e0 ffff883fd07944a8 [93700.363942] 0000000000000021 ffff883ffcc4bbe0 ffff883ffcc45c08 ffffffff81162b64 [93700.363943] ffff883ffcc45e40 ffffffff810327f8 ffff883ffcc4c330 00000064fcc45c36 [93700.363943] ffff883ffcc45ef8 00000000fcc45c68 0000000200000000 ffffc90000000000 [93700.363944] ffffc90029494800 ffff883ffcc45d00 ffffffffa08689e9 0000000000000000 [93700.363944] ffff883ffcc45c88 ffffffff8130ab22 ffffc900294932f8 000000060dba31b0 [93700.363945] 0000000005080021 ffffffff8130bb13 0000000000000000 0000000000000000 [93700.363946] 0000003f00000000 3235ffff8130dbf4 000000000001ffff 0000000000000000 [93700.363946] 0000000000000000 ffffc900294932d8 0000000000001528 ffff883fd44d9f80 [93700.363947] 00000000000557ec ffff883ffcc45d90 ffffffff813029d1 ffffc900181f0fff [93700.363947] ffffc900181f1000 ffffc900181f0fff ffffc900181f1000 ffffffff81962c98 [93700.363948] ffffc900181f0fff ffff881ffe8d8008 00003ffffffff000 ffffc900181f1000 [93700.363949] ffffc900181f0fff ffffc900181f1000 0000000000000000 00000000557ec02c [93700.363949] ffff883fcac74534 0000000004000000 ffffc900181f0000 ffff883ffcc45d90 [93700.363950] ffffffff811a97c1 ffff883ffcc45de8 ffffffff813a4954 0000000000000014 [93700.363950] 0000000000000014 0000000000000000 00000001813a27ff ffff883fca557758 [93700.363951] Call Trace: [93700.363960] <NMI> [<ffffffff8164791f>] dump_stack+0x19/0x1b [93700.363964] [<ffffffff8164105f>] panic+0xd8/0x214 [93700.363968] [<ffffffff8111e871>] watchdog_overflow_callback+0xd1/0xe0 [93700.363972] [<ffffffff81162091>] __perf_event_overflow+0xa1/0x250 [93700.363974] [<ffffffff81162b64>] perf_event_overflow+0x14/0x20 [93700.363977] [<ffffffff810327f8>] intel_pmu_handle_irq+0x1e8/0x470 [93700.363992] [<ffffffff8130ab22>] ? put_dec+0x72/0x90 [93700.363993] [<ffffffff8130bb13>] ? number.isra.2+0x323/0x360 [93700.363995] [<ffffffff813029d1>] ? ioremap_page_range+0x241/0x320 [93700.363999] [<ffffffff811a97c1>] ? unmap_kernel_range_noflush+0x11/0x20 [93700.364003] [<ffffffff813a4954>] ? ghes_copy_tofrom_phys+0x124/0x210 [93700.364005] [<ffffffff813a4ae0>] ? ghes_read_estatus+0xa0/0x190 [93700.364009] [<ffffffff81650f9b>] perf_event_nmi_handler+0x2b/0x50 [93700.364010] [<ffffffff81650619>] nmi_handle.isra.0+0x69/0xb0 [93700.364012] [<ffffffff81650831>] do_nmi+0x1d1/0x410 [93700.364013] [<ffffffff8164fa53>] end_repeat_nmi+0x1e/0x2e [93700.364027] [<ffffffffa0589ea6>] ? vmx_vcpu_run+0x666/0x750 [kvm_intel] [93700.364029] [<ffffffffa0589ea6>] ? vmx_vcpu_run+0x666/0x750 [kvm_intel] [93700.364031] [<ffffffffa0589ea6>] ? vmx_vcpu_run+0x666/0x750 [kvm_intel] [93700.364033] <<EOE>> [<ffffffff81658830>] ? uv_bau_message_intr1+0x80/0x80 [93700.364035] [<ffffffffa05855f0>] ? vmx_inject_irq+0xf0/0xf0 [kvm_intel] [93700.364074] [<ffffffffa03ea569>] vcpu_enter_guest+0x639/0x1160 [kvm] [93700.364087] [<ffffffffa04103b5>] ? kvm_apic_local_deliver+0x65/0x70 [kvm] [93700.364095] [<ffffffffa03f2285>] kvm_arch_vcpu_ioctl_run+0xd5/0x440 [kvm] [93700.364100] [<ffffffffa03d9b51>] kvm_vcpu_ioctl+0x2b1/0x640 [kvm] [93700.364103] [<ffffffff810e7b72>] ? do_futex+0x122/0x5b0 [93700.364105] [<ffffffff811fdf65>] do_vfs_ioctl+0x2e5/0x4c0 [93700.364107] [<ffffffff81235833>] ? anon_inode_getfile+0xd3/0x170 [93700.364113] [<ffffffffa03e7dc5>] ? kvm_on_user_return+0x75/0xb0 [kvm] [93700.364113] [<ffffffff811fe1e1>] SyS_ioctl+0xa1/0xc0 [93700.364115] [<ffffffff81657c49>] system_call_fastpath+0x16/0x1b [93701.422730] Shutting down cpus with NMI [93701.608323] rlock even has been record! [93701.608324] kbox: notify die begin [93701.608324] kbox: no notify die func register. no need to notify [93720.962506] kbox rlock callback end! Regards, -Gonglei