Re: [help] host kernel panic in kvm's wakeup_handler()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Paolo,

We have a reproducer now, it says that the *blocked_vcpu_on_cpu* list is
corruption and double addition.

Do you have any suggestion?

[231298.241923] WARNING: at lib/list_debug.c:36 __list_add+0x8a/0xc0()
[231298.241925] list_add double add: new=ffff881b8bc48050,
prev=ffff881b8bc48050, next=ffff881fffa576f0.
[231298.241926] Modules linked in: guest_kbox_ram(O) igb(OVE) mlx4_ib(OVE)
ib_sa(OVE) ib_mad(OVE) mlx4_en(OVE) mlx4_core(OVE) ib_uverbs(OVE) vhost_scsi(OE)
target_core_pscsi target_core_file target_core_iblock target_core_mod dm_mod
kbox_pci(OVE) ib_core(OVE) ib_addr(OVE) ib_netlink(OVE) compat(OVE) ixgbe(O)
ext3 mbcache jbd signo_catch(O) bum(O) ip_set nfnetlink prio(O) nat(O)
vport_vxlan(O) openvswitch(O) nf_defrag_ipv6 gre libcrc32c kbox(O) pmcint(O)
vxlan ip6_udp_tunnel udp_tunnel sd_mod crc_t10dif crct10dif_generic sg
ipmi_devintf kvm_intel(O) kvm(O) coretemp crct10dif_pclmul crct10dif_common ahci
libahci mpt2sas i2c_i801 i2c_algo_bit libata dca i2c_core raid_class ptp
scsi_transport_sas pps_core ipmi_si ipmi_msghandler nf_conntrack_ipv4
nf_defrag_ipv4 nf_conntrack vhost_net(O) tun(O) vhost(O) macvtap
[231298.241986]  macvlan vfio_pci irqbypass vfio_iommu_type1 vfio ip_tables
[last unloaded: guest_kbox_ram]
[231298.241994] CPU: 1 PID: 12431 Comm: CPU 0/KVM Tainted: G        W  OE
----V-------   3.10.0-327.49.58.52_13.x86_64 #1
[231298.241996] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. CH80GPUB8/CH80GPUB8,
BIOS GPUBV201 06/18/2015
[231298.241997]  ffff881fa372fc60 00000000054b553c ffff881fa372fc18 ffffffff81644aaf
[231298.242002]  ffff881fa372fc50 ffffffff8107b1c0 ffff881b8bc48050 ffff881fffa576f0
[231298.242006]  ffff881b8bc48050 000000000000a022 000000000000001b ffff881fa372fcb8
[231298.242011]  ffffffff8107b25c ffffffff818a9ce8 ffff881b00000030 ffff881fa372fcc8
[231298.242015]  ffff881fa372fc88 00000000054b553c 0000000000000001 ffffffff8107b205
[231298.242020]  ffffffff81a38960 ffff881b8bc48050 ffff881b8bc48050 ffff881fffa576f0
[231298.242025]  ffff881fa372fce0 ffffffff8131a41a ffff881b8bc48000 00000000000176e0
[231298.242029]  0000000000000292 ffff881fa372fdd0 ffffffffa10de6d0 ffff881b8bc48050
[231298.242036]  ffff881fa372ffd8 ffff881bb4c70000 ffff88176dfd8048 0000000000000001
[231298.242043]  ffff881fa372fe18 ffffffff81656a31 ffffffffa10d9360 ffffffffa10fc140
[231298.242048]  ffff881c23580100 0000000000000000 0000000000000000 ffff881b8bc48000
[231298.242052]  ffff881fa372fd88 ffffffffa10dad65 0000000000000000 ffffffffa10d9360
[231298.242057]  00000000054b553c ffff881c23580200 0000000000000000 ffff881c23580000
[231298.242061]  ffff881b8bc48000 ffff881fa372fdb8 ffff881b8bc48000 ffff881fa372ffd8
[231298.242066]  ffff881bb4c70000 ffff88176dfd8048 0000000000000001 ffff881fa372fe18
[231298.242070]  ffffffffa05ed1e8 ffffffee7ffbfaff 00000000054b553c ffff881b8bc48000
[231298.242075]  ffff883fb857b600 0000000000000000 ffff881ae737dc00 ffff881bb4c70000
[231298.242079]  ffff881fa372feb0 ffffffffa05d4b31 0000000000000000 0000000000008000
[231298.242084]  ffff881fa372fe70 ffffffff8112f643 000000000000ffff ffff881ae737dc38
[231298.242088]  ffffffffa05d4880 0000000000000000 0000000000000000 000000000000ae80
[231298.242093]  ffff881ae737dc00 00000000054b553c ffff881ae737dc00 ffff883fd2a0a500
[231298.242097]  0000000000000000 0000000000000000 0000000000000001 ffff881fa372ff28
[231298.242102]  ffffffff811fd9d5 000000000000ffff ffff881ae737dc38 0000000000000000
[231298.242106]  0000000000000000 000000000000ae80 0000000000000018 ffff881ae737dc00
[231298.242111] Call Trace:
[231298.242115]  [<ffffffff81644aaf>] dump_stack+0x19/0x1b
[231298.242118]  [<ffffffff8107b1c0>] warn_slowpath_common+0x70/0xb0
[231298.242122]  [<ffffffff8107b25c>] warn_slowpath_fmt+0x5c/0x80
[231298.242126]  [<ffffffff8107b205>] ? warn_slowpath_fmt+0x5/0x80
[231298.242130]  [<ffffffff8131a41a>] __list_add+0x8a/0xc0
[231298.242136]  [<ffffffffa10de6d0>] vmx_pre_block+0xe0/0x220 [kvm_intel]
[231298.242140]  [<ffffffff81656a31>] ? ftrace_call+0x5/0x2f
[231298.242145]  [<ffffffffa10d9360>] ? vmx_invpcid_supported+0x20/0x20 [kvm_intel]
[231298.242151]  [<ffffffffa10dad65>] ? vmx_sync_pir_to_irr+0x5/0x30 [kvm_intel]
[231298.242156]  [<ffffffffa10d9360>] ? vmx_invpcid_supported+0x20/0x20 [kvm_intel]
[231298.242167]  [<ffffffffa05ed1e8>] kvm_arch_vcpu_ioctl_run+0x178/0x440 [kvm]
[231298.242176]  [<ffffffffa05d4b31>] kvm_vcpu_ioctl+0x2b1/0x640 [kvm]
[231298.242180]  [<ffffffff8112f643>] ? ftrace_ops_list_func+0x83/0x110
[231298.242189]  [<ffffffffa05d4880>] ? vcpu_put+0x30/0x30 [kvm]
[231298.242193]  [<ffffffff811fd9d5>] do_vfs_ioctl+0x2e5/0x4c0
[231298.242197]  [<ffffffff811fdc51>] SyS_ioctl+0xa1/0xc0
[231298.242201]  [<ffffffff81654e09>] system_call_fastpath+0x16/0x1b


[231298.245626] WARNING: at lib/list_debug.c:33 __list_add+0xac/0xc0()
[231298.245628] list_add corruption. prev->next should be next
(ffff881fffa576f0), but was dead000000100100. (prev=ffff881b8bc48050).
[231298.245629] Modules linked in: guest_kbox_ram(O) igb(OVE) mlx4_ib(OVE)
ib_sa(OVE) ib_mad(OVE) mlx4_en(OVE) mlx4_core(OVE) ib_uverbs(OVE) vhost_scsi(OE)
target_core_pscsi target_core_file target_core_iblock target_core_mod dm_mod
kbox_pci(OVE) ib_core(OVE) ib_addr(OVE) ib_netlink(OVE) compat(OVE) ixgbe(O)
ext3 mbcache jbd signo_catch(O) bum(O) ip_set nfnetlink prio(O) nat(O)
vport_vxlan(O) openvswitch(O) nf_defrag_ipv6 gre libcrc32c kbox(O) pmcint(O)
vxlan ip6_udp_tunnel udp_tunnel sd_mod crc_t10dif crct10dif_generic sg
ipmi_devintf kvm_intel(O) kvm(O) coretemp crct10dif_pclmul crct10dif_common ahci
libahci mpt2sas i2c_i801 i2c_algo_bit libata dca i2c_core raid_class ptp
scsi_transport_sas pps_core ipmi_si ipmi_msghandler nf_conntrack_ipv4
nf_defrag_ipv4 nf_conntrack vhost_net(O) tun(O) vhost(O) macvtap
[231298.245711]  macvlan vfio_pci irqbypass vfio_iommu_type1 vfio ip_tables
[last unloaded: guest_kbox_ram]
[231298.245725] CPU: 1 PID: 12431 Comm: CPU 0/KVM Tainted: G        W  OE
----V-------   3.10.0-327.49.58.52_13.x86_64 #1
[231298.245729] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. CH80GPUB8/CH80GPUB8,
BIOS GPUBV201 06/18/2015
[231298.245732]  ffff881fa372fc60 00000000054b553c ffff881fa372fc18 ffffffff81644aaf
[231298.245740]  ffff881fa372fc50 ffffffff8107b1c0 ffff881b8bc48050 ffff881fffa576f0
[231298.245748]  ffff881b8bc48050 000000000000a022 000000000000001b ffff881fa372fcb8
[231298.245756]  ffffffff8107b25c ffffffff818a9c98 ffff881f00000030 ffff881fa372fcc8
[231298.245765]  ffff881fa372fc88 00000000054b553c 0000000000000001 ffffffff8107b205
[231298.245773]  ffffffff81a38960 ffff881fffa576f0 dead000000100100 ffff881b8bc48050
[231298.245781]  ffff881fa372fce0 ffffffff8131a43c ffff881b8bc48000 00000000000176e0
[231298.245791]  0000000000000292 ffff881fa372fdd0 ffffffffa10de6d0 ffff881b8bc48050
[231298.245799]  ffff881fa372ffd8 ffff881bb4c70000 ffff88176dfd8048 0000000000000001
[231298.245808]  ffff881fa372fe18 ffffffff81656a31 ffffffffa10d9360 ffffffffa10fc140
[231298.245816]  ffff881c23580100 0000000000000000 0000000000000000 ffff881b8bc48000
[231298.245826]  ffff881fa372fd88 ffffffffa10dad65 0000000000000000 ffffffffa10d9360
[231298.245834]  00000000054b553c ffff881c23580200 0000000000000000 ffff881c23580000
[231298.245842]  ffff881b8bc48000 ffff881fa372fdb8 ffff881b8bc48000 ffff881fa372ffd8
[231298.245847]  ffff881bb4c70000 ffff88176dfd8048 0000000000000001 ffff881fa372fe18
[231298.245851]  ffffffffa05ed1e8 ffffffee7ffbfaff 00000000054b553c ffff881b8bc48000
[231298.245856]  ffff883fb857b600 0000000000000000 ffff881ae737dc00 ffff881bb4c70000
[231298.245861]  ffff881fa372feb0 ffffffffa05d4b31 0000000000000000 0000000000008000
[231298.245866]  ffff881fa372fe70 ffffffff8112f643 000000000000ffff ffff881ae737dc38
[231298.245870]  ffffffffa05d4880 0000000000000000 0000000000000000 000000000000ae80
[231298.245875]  ffff881ae737dc00 00000000054b553c ffff881ae737dc00 ffff883fd2a0a500
[231298.245879]  0000000000000000 0000000000000000 0000000000000001 ffff881fa372ff28
[231298.245883]  ffffffff811fd9d5 000000000000ffff ffff881ae737dc38 0000000000000000
[231298.245888]  0000000000000000 000000000000ae80 0000000000000018 ffff881ae737dc00
[231298.245893] Call Trace:
[231298.245898]  [<ffffffff81644aaf>] dump_stack+0x19/0x1b
[231298.245902]  [<ffffffff8107b1c0>] warn_slowpath_common+0x70/0xb0
[231298.245906]  [<ffffffff8107b25c>] warn_slowpath_fmt+0x5c/0x80
[231298.245910]  [<ffffffff8107b205>] ? warn_slowpath_fmt+0x5/0x80
[231298.245913]  [<ffffffff8131a43c>] __list_add+0xac/0xc0
[231298.245920]  [<ffffffffa10de6d0>] vmx_pre_block+0xe0/0x220 [kvm_intel]
[231298.245924]  [<ffffffff81656a31>] ? ftrace_call+0x5/0x2f
[231298.245930]  [<ffffffffa10d9360>] ? vmx_invpcid_supported+0x20/0x20 [kvm_intel]
[231298.245936]  [<ffffffffa10dad65>] ? vmx_sync_pir_to_irr+0x5/0x30 [kvm_intel]
[231298.245941]  [<ffffffffa10d9360>] ? vmx_invpcid_supported+0x20/0x20 [kvm_intel]
[231298.245953]  [<ffffffffa05ed1e8>] kvm_arch_vcpu_ioctl_run+0x178/0x440 [kvm]
[231298.245962]  [<ffffffffa05d4b31>] kvm_vcpu_ioctl+0x2b1/0x640 [kvm]
[231298.245967]  [<ffffffff8112f643>] ? ftrace_ops_list_func+0x83/0x110
[231298.245976]  [<ffffffffa05d4880>] ? vcpu_put+0x30/0x30 [kvm]
[231298.245980]  [<ffffffff811fd9d5>] do_vfs_ioctl+0x2e5/0x4c0
[231298.245985]  [<ffffffff811fdc51>] SyS_ioctl+0xa1/0xc0
[231298.245989]  [<ffffffff81654e09>] system_call_fastpath+0x16/0x1b


On 2017/5/26 18:40, Paolo Bonzini wrote:

> 
> 
> On 24/05/2017 07:04, Longpeng (Mike) wrote:
>>>> it crashed at *1ec1* and %rax get a wrong value(0xdead000000100100) at *1e92*,
>>>> it seems the *blocked_vcpu_on_cpu* list is corrupted, but kvm only access this
>>>> list in pre_block/post_block/wakeup_handler, and these three functions seems good.
>>>>
>>>> kvm version is 4.4-stable.
>>>>
>>>> Do you have any ideas? Any suggestion would be greatly appreciated, thanks!
>>>>
>>> Is this only seen with posted interrupt support enabled?  Booting with
>>> intremap=nopost on the kernel commandline would disable it.  Thanks,
>>
>> We tested with PI support enabled, but we not sure if it only occurs with PI
>> enabled yet.
> 
> This code should not run at all with PI disabled, since the handler is
> only reachable through an IRTE.
> 
> As you said, the list manipulation in those function is fairly simple.
> If you have a reproducer, you can try running it with CONFIG_LIST_DEBUG
> and see what you get.
> 
> Thanks,
> 
> Paolo
> 
> .
> 


-- 
Regards,
Longpeng(Mike)




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux