Re: PROBLEM: Regression of MMU causing guest VM application errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/22/19 4:28 PM, Sean Christopherson wrote:
On Thu, Oct 17, 2019 at 07:57:35PM -0400, Derek Yerger wrote:
On 10/16/19 1:49 PM, Sean Christopherson wrote:
On Wed, Oct 16, 2019 at 11:28:57AM -0600, Alex Williamson wrote:
On Wed, 16 Oct 2019 00:49:51 -0400
Derek Yerger<derek@xxxxxxx>  wrote:

In at least Linux 5.2.7 via Fedora, up to 5.2.18, guest OS applications
repeatedly crash with segfaults. The problem does not occur on 5.1.16.

System is running Fedora 29 with kernel 5.2.18. Guest OS is Windows 10 with an
AMD Radeon 540 GPU passthrough. When on 5.2.7 or 5.2.18, specific windows
applications frequently and repeatedly crash, throwing exceptions in random
libraries. Going back to 5.1.16, the issue does not occur.

The host system is unaffected by the regression.

Keywords: kvm mmu pci passthrough vfio vfio-pci amdgpu

Possibly related: Unmerged [PATCH] KVM: x86/MMU: Zap all when removing memslot
if VM has assigned device
That was never merged because it was superseded by:

d012a06ab1d2 Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"

That revert also induced this commit:

002c5f73c508 KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot

Both of these were merged to stable, showing up in 5.2.11 and 5.2.16
respectively, so seeing these sorts of issues might be considered a
known issue on 5.2.7, but not 5.2.18 afaik.  Do you have a specific
test that reliably reproduces the issue?  Thanks,
Test case 1: Kernel 5.2.18, PCI passthrough, Windows 10 guest, error condition.
Error 1: Application error in Firefox, restarting firefox and restoring tabs
reliably causes application crash with stack overflow error.
Error 2: Guest BSOD by the morning if left idle
Error 3: Guest BSOD within 1 minute of using SolidWorks CAD software

Test case 2: Kernel 5.2.18, no PCI passthrough, same environment. Guest BSOD
encountered.

Test case 3: Kernel 5.1.16, no PCI passthrough, same environment. Worked in
Solidworks for 10 minutes without BSOD. Opened firefox and restored tabs, no
crash.

Test case 4: Kernel 5.1.16, with PCI passthrough, same environment. Worked
in Solidworks for a half hour. Opened firefox and restored tabs, no crash.

Other factors: The guest does not change between tests. Same drivers,
software, etc. I have reliably switched between 5.2.x and 5.1.x multiple
times in the past month and repeatably see issues with 5.2.x. At this point
I'm unsure if it's PCI passthrough causing the problem.

I know I should probably start from fresh host and guest, but time isn't
really permitting.
Also, does the failure reproduce on on 5.2.1 - 5.2.6?  The memslot debacle
exists on all flavors of 5.2.x, if the errors showed up in 5.2.7 then they
are being caused by something else.
After experiencing the issue in absence of PCI passthrough, I believe the
problem is unrelated to the memslot debacle.
Heh, should've checked from the get go...  It's definitely not the memslot
issue, because the memslot bug is in 5.1.16 as well.  :-)
I didn't pick up on that, nice catch. The memslot thread was the closest thing I could find to an educated guess.
I'm stuck on 5.1.x for now, maybe I'll give up and get a dedicated windows
machine /s
What hardware are you running on?  I was thinking this was AMD specific,
but then realized you said "AMD Radeon 540 GPU" and not "AMD CPU".
Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] (rev c7)
        Subsystem: Gigabyte Technology Co., Ltd Device 22fe
        Kernel driver in use: vfio-pci
        Kernel modules: amdgpu
(plus related audio device)

I can't think of any other data points that would be helpful to solving system instability in a guest OS. But given my troubleshooting before, it looks like presence/absence of a PCI passthrough device is inconsequential to whether the problem is occurring.

I may have to try out other VMs or a fresh windows guest.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux