Re: [Bug 218267] New: [Sapphire Rapids][Upstream]Boot up multiple Windows VMs hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 15, 2023, bugzilla-daemon@xxxxxxxxxx wrote:
> Platform: Sapphire Rapids Platform
> 
> Host OS: CentOS Stream 9
> 
> Kernel:6.7.0-rc1 (commit:8ed26ab8d59111c2f7b86d200d1eb97d2a458fd1)

...

> Qemu: QEMU emulator version 8.1.94 (v8.2.0-rc4)
> (commit:039afc5ef7367fbc8fb475580c291c2655e856cb)
> 
> Host Kernel cmdline:BOOT_IMAGE=/kvm-vmlinuz root=/dev/mapper/cs_spr--2s2-root
> ro crashkernel=auto console=tty0 console=ttyS0,115200,8n1 3 intel_iommu=on
> disable_mtrr_cleanup
> 
> Bug detailed description
> =======
> We boot up 8 Windows VMs (total vCPUs > pCPUs) in host, random run application
> on each VM such as WPS editing etc, and wait for a moment, then Some of the
> Windows Guest hang and console reports "KVM internal error. Suberror: 3".

...

> Code=25 88 61 00 00 b9 70 00 00 40 0f ba 32 00 72 06 33 c0 8b d0 <0f> 30 5a 58
> 59 c3 cc cc cc cc cc cc 0f 1f 84 00 00 00 00 00 48 81 ec 38 01 00 00 48 8d 84
> 
> KVM internal error. Suberror: 3
> extra data[0]: 0x000000008000002f  <= Vectoring IRQ 47 (decimal)
> extra data[1]: 0x0000000000000020  <= WRMSR VM-Exit
> extra data[2]: 0x0000000000000f82
> extra data[3]: 0x000000000000004b

KVM exits with an internal error because the CPU indicates that IRQ 47 was being
delivered/vectored when the VM-Exit occurred, but the VM-Exit is due to WRMSR.
A WRMSR VM-Exit is supposed to only occur on an instruction boundary, i.e. can't
occur while delivering an IRQ (or any exception/event), and so KVM kicks out to
userspace because something has gone off the rails.

   b9 70 00 00 40          mov    0x40000070, ecx
   0f ba 32 00             btr    0x0, DWORD PTR [rdx]
   72 06                   jb     0x16
   33 c0                   xor    eax,eax
   8b d0                   mov    eax, edx
   0f 30                   wrmsr

FWIW, the MSR in question is Hyper-V's synthetic EOI, a.k.a. HV_X64_MSR_EOI, though
I doubt the exact MSR matters.

Have you tried an older host kernel?  If not can you try something like v6.1?
Note, if you do, use base v6.1, *not* the stable tree in case a bug was backported.

There was a recent change to relevant code, commit 50011c2a2457 ("KVM: VMX: Refresh
available regs and IDT vectoring info before NMI handling"), though I don't see
any obvious bugs.  But I'm pretty sure the only alternative explanation is a
CPU/ucode bug, so it's definitely worth checking older versions of KVM.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux