Re: PROBLEM: Regression of MMU causing guest VM application errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 17, 2019 at 07:57:35PM -0400, Derek Yerger wrote:
> On 10/16/19 1:49 PM, Sean Christopherson wrote:
> >On Wed, Oct 16, 2019 at 11:28:57AM -0600, Alex Williamson wrote:
> >>On Wed, 16 Oct 2019 00:49:51 -0400
> >>Derek Yerger<derek@xxxxxxx>  wrote:
> >>
> >>>In at least Linux 5.2.7 via Fedora, up to 5.2.18, guest OS applications
> >>>repeatedly crash with segfaults. The problem does not occur on 5.1.16.
> >>>
> >>>System is running Fedora 29 with kernel 5.2.18. Guest OS is Windows 10 with an
> >>>AMD Radeon 540 GPU passthrough. When on 5.2.7 or 5.2.18, specific windows
> >>>applications frequently and repeatedly crash, throwing exceptions in random
> >>>libraries. Going back to 5.1.16, the issue does not occur.
> >>>
> >>>The host system is unaffected by the regression.
> >>>
> >>>Keywords: kvm mmu pci passthrough vfio vfio-pci amdgpu
> >>>
> >>>Possibly related: Unmerged [PATCH] KVM: x86/MMU: Zap all when removing memslot
> >>>if VM has assigned device
> >>That was never merged because it was superseded by:
> >>
> >>d012a06ab1d2 Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"
> >>
> >>That revert also induced this commit:
> >>
> >>002c5f73c508 KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
> >>
> >>Both of these were merged to stable, showing up in 5.2.11 and 5.2.16
> >>respectively, so seeing these sorts of issues might be considered a
> >>known issue on 5.2.7, but not 5.2.18 afaik.  Do you have a specific
> >>test that reliably reproduces the issue?  Thanks,
> Test case 1: Kernel 5.2.18, PCI passthrough, Windows 10 guest, error condition.
> Error 1: Application error in Firefox, restarting firefox and restoring tabs
> reliably causes application crash with stack overflow error.
> Error 2: Guest BSOD by the morning if left idle
> Error 3: Guest BSOD within 1 minute of using SolidWorks CAD software
> 
> Test case 2: Kernel 5.2.18, no PCI passthrough, same environment. Guest BSOD
> encountered.
> 
> Test case 3: Kernel 5.1.16, no PCI passthrough, same environment. Worked in
> Solidworks for 10 minutes without BSOD. Opened firefox and restored tabs, no
> crash.
> 
> Test case 4: Kernel 5.1.16, with PCI passthrough, same environment. Worked
> in Solidworks for a half hour. Opened firefox and restored tabs, no crash.
> 
> Other factors: The guest does not change between tests. Same drivers,
> software, etc. I have reliably switched between 5.2.x and 5.1.x multiple
> times in the past month and repeatably see issues with 5.2.x. At this point
> I'm unsure if it's PCI passthrough causing the problem.
> 
> I know I should probably start from fresh host and guest, but time isn't
> really permitting.
> >Also, does the failure reproduce on on 5.2.1 - 5.2.6?  The memslot debacle
> >exists on all flavors of 5.2.x, if the errors showed up in 5.2.7 then they
> >are being caused by something else.
> After experiencing the issue in absence of PCI passthrough, I believe the
> problem is unrelated to the memslot debacle.

Heh, should've checked from the get go...  It's definitely not the memslot
issue, because the memslot bug is in 5.1.16 as well.  :-)

> I'm stuck on 5.1.x for now, maybe I'll give up and get a dedicated windows
> machine /s

What hardware are you running on?  I was thinking this was AMD specific,
but then realized you said "AMD Radeon 540 GPU" and not "AMD CPU".



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux