RE: Hit a hard lockup BUG while started VM with passthrough NIC

"Gonglei (Arei)" <arei.gonglei@xxxxxxxxxx> · Sat, 23 Sep 2017 02:46:33 +0000

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@xxxxxxxxxx]
> Sent: Thursday, September 21, 2017 11:11 PM
> To: Alex Williamson; Gonglei (Arei)
> Cc: kvm@xxxxxxxxxxxxxxx; Zhanghailiang; Huangweidong (C)
> Subject: Re: Hit a hard lockup BUG while started VM with passthrough NIC
> 
> On 21/09/2017 17:00, Alex Williamson wrote:
> >>
> >> We hit a hard lockup bug in our tests while started VM with passthrough
> NIC.
> >> Unfortunately, we didn't get the available vmcore to dig into. And it is quite
> difficult
> >> To reproduce, actually, we only hit this problem once.
> >> Does anyone hit such problem ? Or any idea ?
> >> (For the complete log, please see the attachment).
> > This is not an upstream or a RHEL kernel, so I don't have the sources
> > to do much analysis.  Based on the kernel version number, I'm guessing
> > this is some derivative of a RHEL-7.2 kernel.  Can it be reproduced on
> > upstream (this is an upstream list)?  The ghes functions in the dump
> > might indicate might indicate a hardware error was triggered and the
> > firmware logs might provide more indication of the problem.  Thanks,
> 
> I don't recall anything with passthrough, but it looks like this was
> caused by perf running at the same time as KVM (perf_event_overflow in
> the second backtrace, "cd 02" aka NMI in the first backtrace's code
> hexdump).
> 
> There have been some perf bugs related to nested in RHEL7.2 and RHEL7.3
> (start L1, do "perf top" in another terminal, boom when you start L2)
> and they seem to be fixed in RHEL7.4.  However, we never tried bisecting
> for the fix.
> 
Thanks for your feedback. 

1) From the backtrace, we can see the current vcpu is of non-root mode and then vm exit
because an NMI. 

  <NMI>  [<ffffffff8164791f>] dump_stack+0x19/0x1b
  [<ffffffff8164105f>] panic+0xd8/0x214
  [<ffffffff8111e871>] watchdog_overflow_callback+0xd1/0xe0
  [<ffffffff81162091>] __perf_event_overflow+0xa1/0x250
  [<ffffffff81162b64>] perf_event_overflow+0x14/0x20
  [<ffffffff810327f8>] intel_pmu_handle_irq+0x1e8/0x470
  [<ffffffff8130ab22>] ? put_dec+0x72/0x90
  [<ffffffff8130bb13>] ? number.isra.2+0x323/0x360
  [<ffffffff813029d1>] ? ioremap_page_range+0x241/0x320
  [<ffffffff811a97c1>] ? unmap_kernel_range_noflush+0x11/0x20
  [<ffffffff813a4954>] ? ghes_copy_tofrom_phys+0x124/0x210
  [<ffffffff813a4ae0>] ? ghes_read_estatus+0xa0/0x190
  [<ffffffff81650f9b>] perf_event_nmi_handler+0x2b/0x50
  [<ffffffff81650619>] nmi_handle.isra.0+0x69/0xb0
  [<ffffffff81650831>] do_nmi+0x1d1/0x410
  [<ffffffff8164fa53>] end_repeat_nmi+0x1e/0x2e
  [<ffffffffa0589ea6>] ? vmx_vcpu_run+0x666/0x750 [kvm_intel]
  [<ffffffffa0589ea6>] ? vmx_vcpu_run+0x666/0x750 [kvm_intel]
  [<ffffffffa0589ea6>] ? vmx_vcpu_run+0x666/0x750 [kvm_intel]
  <<EOE>>  [<ffffffff81658830>] ? uv_bau_message_intr1+0x80/0x80
  [<ffffffffa05855f0>] ? vmx_inject_irq+0xf0/0xf0 [kvm_intel]
  [<ffffffffa03ea569>] vcpu_enter_guest+0x639/0x1160 [kvm]
  [<ffffffffa04103b5>] ? kvm_apic_local_deliver+0x65/0x70 [kvm]
  [<ffffffffa03f2285>] kvm_arch_vcpu_ioctl_run+0xd5/0x440 [kvm]
  [<ffffffffa03d9b51>] kvm_vcpu_ioctl+0x2b1/0x640 [kvm]
  [<ffffffff810e7b72>] ? do_futex+0x122/0x5b0
  [<ffffffff811fdf65>] do_vfs_ioctl+0x2e5/0x4c0
  [<ffffffff81235833>] ? anon_inode_getfile+0xd3/0x170
  [<ffffffffa03e7dc5>] ? kvm_on_user_return+0x75/0xb0 [kvm]
  [<ffffffff811fe1e1>] SyS_ioctl+0xa1/0xc0
  [<ffffffff81657c49>] system_call_fastpath+0x16/0x1b
Shutting down cpus with NMI

2) As we all know, the kmod is of irq_disabled status in non-root mode (vcpu_enter_guest),
 But if any interrupts occur or pending, the guest will vmexit to handle the interrupt. 

So, we can infer that there isn't interrupt pending during 20s (hard lockup period) according
above two reasons.

But, what about timer interrupt? Our host OS is CentOS 7.2, whose HZ is 1000. That
means timer interrupt will be occurred per second...

So we guess that host's timer maybe the root cause. And we found some clues:  

###num:1, record len:203

console region: kernel reboot time:20170911201235-43cfdkernel reboot time:20170911201235-43cfd
[15141.708423] collected_len = 354176, LOG_BUF_LEN_LOCAL = 1048576
kernel reboot time:20170911201235-43cfd   ---------- 2017 

After kernel panic:

###num:2, record len:114

console region: rlock time:20571104194044-c5c73
rlock time:20571104194044-c5c73        ------------ 2057
rlock time:20571104194044-c5c73

Why the system timer is changed to 40 years later after panic ? Is integer overflow of hrtimer? 
And then cause the hrtimer expired? 

Thanks,
-Gonglei