Re: PROBLEM: Regression of MMU causing guest VM application errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/16/19 1:49 PM, Sean Christopherson wrote:
On Wed, Oct 16, 2019 at 11:28:57AM -0600, Alex Williamson wrote:
On Wed, 16 Oct 2019 00:49:51 -0400
Derek Yerger<derek@xxxxxxx>  wrote:

In at least Linux 5.2.7 via Fedora, up to 5.2.18, guest OS applications
repeatedly crash with segfaults. The problem does not occur on 5.1.16.

System is running Fedora 29 with kernel 5.2.18. Guest OS is Windows 10 with an
AMD Radeon 540 GPU passthrough. When on 5.2.7 or 5.2.18, specific windows
applications frequently and repeatedly crash, throwing exceptions in random
libraries. Going back to 5.1.16, the issue does not occur.

The host system is unaffected by the regression.

Keywords: kvm mmu pci passthrough vfio vfio-pci amdgpu

Possibly related: Unmerged [PATCH] KVM: x86/MMU: Zap all when removing memslot
if VM has assigned device
That was never merged because it was superseded by:

d012a06ab1d2 Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"

That revert also induced this commit:

002c5f73c508 KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot

Both of these were merged to stable, showing up in 5.2.11 and 5.2.16
respectively, so seeing these sorts of issues might be considered a
known issue on 5.2.7, but not 5.2.18 afaik.  Do you have a specific
test that reliably reproduces the issue?  Thanks,
Test case 1: Kernel 5.2.18, PCI passthrough, Windows 10 guest, error condition.
Error 1: Application error in Firefox, restarting firefox and restoring tabs reliably causes application crash with stack overflow error.
Error 2: Guest BSOD by the morning if left idle
Error 3: Guest BSOD within 1 minute of using SolidWorks CAD software

Test case 2: Kernel 5.2.18, no PCI passthrough, same environment. Guest BSOD encountered.

Test case 3: Kernel 5.1.16, no PCI passthrough, same environment. Worked in Solidworks for 10 minutes without BSOD. Opened firefox and restored tabs, no crash.

Test case 4: Kernel 5.1.16, with PCI passthrough, same environment. Worked in Solidworks for a half hour. Opened firefox and restored tabs, no crash.

Other factors: The guest does not change between tests. Same drivers, software, etc. I have reliably switched between 5.2.x and 5.1.x multiple times in the past month and repeatably see issues with 5.2.x. At this point I'm unsure if it's PCI passthrough causing the problem.

I know I should probably start from fresh host and guest, but time isn't really permitting.
Also, does the failure reproduce on on 5.2.1 - 5.2.6?  The memslot debacle
exists on all flavors of 5.2.x, if the errors showed up in 5.2.7 then they
are being caused by something else.
After experiencing the issue in absence of PCI passthrough, I believe the problem is unrelated to the memslot debacle. I'm stuck on 5.1.x for now, maybe I'll give up and get a dedicated windows machine /s



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux