Re: [help] host kernel panic in kvm's wakeup_handler()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2017/5/24 12:34, Alex Williamson wrote:

> On Wed, 24 May 2017 11:57:34 +0800
> "Longpeng (Mike)" <longpeng2@xxxxxxxxxx> wrote:
> 
>> Hi guys,
>>
>> We power-on/power-off 20 VMs(4 VMs with vfio passthrough NICs) concurrently so
>> many times, and then encounter a host-panic problem:
>>
>> [152878.870508] general protection fault: 0000 [#1] SMP
>> [152878.878710] collected_len = 1048576, LOG_BUF_LEN_LOCAL = 1048576
>> [152878.886921] kbox current status: maintain, do not flush regions to devices.
>> [152878.893952] kbox: notify die begin
>> [152878.897453] kbox: no notify die func register. no need to notify
>> [152878.903533] do nothing after die!
>> [152878.906929] Modules linked in: ib_uverbs(OVE) vhost_scsi(OE)
>> target_core_pscsi target_core_file target_core_iblock target_core_mod
>> guest_kbox_ram(O) kbox_pci(OVE) igb(OVE) mlx4_ib(OVE) ib_sa(OVE) ib_mad(OVE)
>> ib_core(OVE) ib_addr(OVE) ib_netlink(OVE) mlx4_en(OVE) mlx4_core(OVE)
>> compat(OVE) vfio_pci vfio_iommu_type1 vfio(OVE) prio(O) nat(O) vport_vxlan(O)
>> openvswitch(O) nf_defrag_ipv6 gre libcrc32c ixgbe(O) ext3 mbcache jbd kbox(O)
>> pmcint(O) signo_catch(O) dm_mod vxlan ip6_udp_tunnel udp_tunnel sd_mod
>> crc_t10dif crct10dif_generic sg ipmi_devintf iTCO_wdt iTCO_vendor_support
>> kvm_intel(O) kvm(O) coretemp crct10dif_pclmul crct10dif_common crc32_pclmul
>> crc32c_intel ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul
>> ablk_helper cryptd mpt2sas ahci i2c_algo_bit ptp libahci raid_class pps_core
>> i2c_i801 libata scsi_transport_sas dca lpc_ich i2c_core mfd_core shpchp ipmi_si
>> ipmi_msghandler nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vhost_net(O)
>> tun(O) vhost(O) macvtap macvlan irqbypass ip_tables [last unloaded: igb]
>> [152878.998665] CPU: 10 PID: 0 Comm: swapper/10 Tainted: G        W  OE
>> ----V-------   3.10.0-327.49.58.45_12.x86_64 #1
>> [152879.009245] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. CH80GPUB8/CH80GPUB8,
>> BIOS GPUBV201 06/18/2015
>> [152879.018881] task: ffff881fd2ce7300 ti: ffff881fd2d10000 task.ti:
>> ffff881fd2d10000
>> [152879.026803] RIP: 0010:[<ffffffffa1767ec1>]  [<ffffffffa1767ec1>]
>> wakeup_handler+0x71/0xb0 [kvm_intel]
>> [152879.036460] RSP: 0018:ffff883fff003f70  EFLAGS: 00010083
>> [152879.042024] RAX: dead000000100100 RBX: dead0000001000b0 RCX: ffff883fff0176f0
>> [152879.049595] RDX: ffff883fff000000 RSI: 0000000000000082 RDI: ffff881c9c7f0000
>> [152879.057139] RBP: ffff883fff003f90 R08: ffff881e522dfd90 R09: 0000000000000018
>> [152879.061675] mlx4_en: eth1: Port:2: removing fa:29:3e:2e:68:80
>> [152879.070720] R10: 000000000000039f R11: ffff881cfbf278f6 R12: 00000000000176e0
>> [152879.078282] R13: 000000000000000a R14: 00000000000176f0 R15: ffffffff81a13538
>> [152879.085845] FS:  0000000000000000(0000) GS:ffff883fff000000(0000)
>> knlGS:0000000000000000
>> [152879.094361] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [152879.100378] CR2: 0000000000605168 CR3: 000000000195e000 CR4: 00000000003427e0
>> [152879.107921] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [152879.115478] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [152879.123019] Stack:
>> [152879.125313]  0000000000000000 0000000000000004 00008b2da04a3938 0000000000000004
>> [152879.133227]  ffff883fff003fa8 ffffffff81016a28 ffffe8ffff800500 ffff881fd2d13e78
>> [152879.141121]  ffffffff81655cdd ffff881fd2d13dc8 <EOI>  ffff881fd2d13e78
>> 00000000000003e8
>> [152879.149702]  ffff881cfbf278f6 000000000000039f 0000000000000018 00000000000003e8
>> [152879.157647]  00008b2da04f9b8e 0000000000000018 0000000225c17d03 ffff881fd2d13fd8
>> [152879.165597]  00008b2da04f9b8e ffffffffffffff0e ffffffff814e2b72 0000000000000010
>> [152879.173560]  0000000000000206 ffff881fd2d13e50 0000000000000018 ffffe8ffff800500
>> [152879.181401]  0000000000000004 0000000000000004 ffffffff81a133c0 0000000000000000
>> [152879.189297]  ffff881fd2d13eb8 ffffffff814e2cb9 0000000a00000000 ffff881fd2d10000
>> [152879.197183]  ffffffff81a7de20 ffff881fd2d10000 ffff881fd2d10000 0000000000000000
>> [152879.205069]  ffff881fd2d13ec8 ffffffff8101e68e ffff881fd2d13f20 ffffffff810d7535
>> [152879.212968]  ffff881fd2d13fd8 ffff881fd2d10000 a960cc5a1933ed1c ef90c751bae26ef0
>> [152879.220892]  ffff881fd2d13f30 0000000000000000 0000000000000000 0000000000000000
>> [152879.228792]  0000000000000000 ffff881fd2d13f48 ffffffff81047c1a ef90c751bae26ef0
>> [152879.236675]  f26ae3384c8900f4 0000000000000000 0000000000000000 0000000000000000
>> [152879.244597]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [152879.252505]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [152879.260490]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [152879.268393]  0000000000000000 0000000000000000 0000000000000000 ffffffffffffffff
>> [152879.276269]  0000000000000000 0000000000000010 0000000000000202 ffff881fd2d13f58
>> [152879.284171]  0000000000000018
>> [152879.287489] Call Trace:
>> [152879.290205]  <IRQ>
>> [152879.292219]  [<ffffffff81016a28>] smp_kvm_posted_intr_wakeup_ipi+0x48/0x60
>> [152879.299762]  [<ffffffff81655cdd>] kvm_posted_intr_wakeup_ipi+0x6d/0x80
>> [152879.306567]  <EOI>
>> [152879.308598]  [<ffffffff814e2b72>] ? cpuidle_enter_state+0x52/0xc0
>> [152879.315359]  [<ffffffff814e2cb9>] cpuidle_idle_call+0xd9/0x210
>> [152879.321481]  [<ffffffff8101e68e>] arch_cpu_idle+0xe/0x30
>> [152879.327058]  [<ffffffff810d7535>] cpu_startup_entry+0x245/0x290
>> [152879.333224]  [<ffffffff81047c1a>] start_secondary+0x1ba/0x230
>> [152879.339222] Code: 4a 8d 0c 32 48 39 c8 48 8d 58 b0 75 1e eb 3b 0f 1f 00 4a
>> 8b 14 ed a0 14 a7 81 48 8b 43 50 49 8d 0c 16 48 8d 58 b0 48 39 c8 74 1f <48> 8b
>> 83 e0 30 00 00 a8 01 74 dc 48 89 df e8 1c 6d e5 fe eb d2
>> [152879.360254] RIP  [<ffffffffa1767ec1>] wakeup_handler+0x71/0xb0 [kvm_intel]
>> [152879.367436]  RSP <ffff883fff003f70>
>> [152879.371668] ---[ end trace 382c2b1701889417 ]---
>>
>> There's no vmcore for some reason, but we disassembly the wakeup_handler():
>>     ......
>>     1e92:       4a 8b 04 32             mov    (%rdx,%r14,1),%rax <-- *Here*
>>     1e96:       4a 8d 0c 32             lea    (%rdx,%r14,1),%rcx
>>     1e9a:       48 39 c8                cmp    %rcx,%rax
>>     1e9d:       48 8d 58 b0             lea    -0x50(%rax),%rbx
>>     1ea1:       75 1e                   jne    1ec1 <wakeup_handler+0x71>
>>     1ea3:       eb 3b                   jmp    1ee0 <wakeup_handler+0x90>
>>     1ea5:       0f 1f 00                nopl   (%rax)
>>     1ea8:       4a 8b 14 ed 00 00 00    mov    0x0(,%r13,8),%rdx
>>     1eaf:       00
>>     1eb0:       48 8b 43 50             mov    0x50(%rbx),%rax
>>     1eb4:       49 8d 0c 16             lea    (%r14,%rdx,1),%rcx
>>     1eb8:       48 8d 58 b0             lea    -0x50(%rax),%rbx
>>     1ebc:       48 39 c8                cmp    %rcx,%rax
>>     1ebf:       74 1f                   je     1ee0 <wakeup_handler+0x90>
>>     1ec1:       48 8b 83 e0 30 00 00    mov    0x30e0(%rbx),%rax <-- *Here*
>>     ......
>> it crashed at *1ec1* and %rax get a wrong value(0xdead000000100100) at *1e92*,
>> it seems the *blocked_vcpu_on_cpu* list is corrupted, but kvm only access this
>> list in pre_block/post_block/wakeup_handler, and these three functions seems good.
>>
>> kvm version is 4.4-stable.
>>
>> Do you have any ideas? Any suggestion would be greatly appreciated, thanks!
>>
> 
> Is this only seen with posted interrupt support enabled?  Booting with
> intremap=nopost on the kernel commandline would disable it.  Thanks,
> 
> Alex
> 


Hi Alex,

We tested with PI support enabled, but we not sure if it only occurs with PI
enabled yet.

*lscpu:*
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2618L v4 @ 2.20GHz
Stepping:              1
CPU MHz:               1452.085
BogoMIPS:              4405.88
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-9,20-29
NUMA node1 CPU(s):     10-19,30-39

We would try to reproduce the problem again. Thanks :)

> .
> 


-- 
Regards,
Longpeng(Mike)




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux