Re: [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot

Alex Williamson <alex.williamson@xxxxxxxxxx> · Tue, 13 Aug 2019 13:33:16 -0600

On Tue, 13 Aug 2019 11:57:37 -0600
Alex Williamson <alex.williamson@xxxxxxxxxx> wrote:

> On Tue, 13 Aug 2019 10:04:41 -0700
> Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:
> 
> > On Tue, Aug 13, 2019 at 10:04:58AM -0600, Alex Williamson wrote:  
> > > On Tue,  5 Feb 2019 13:01:21 -0800
> > > Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:
> > >     
> > > > Modify kvm_mmu_invalidate_zap_pages_in_memslot(), a.k.a. the x86 MMU's
> > > > handler for kvm_arch_flush_shadow_memslot(), to zap only the pages/PTEs
> > > > that actually belong to the memslot being removed.  This improves
> > > > performance, especially why the deleted memslot has only a few shadow
> > > > entries, or even no entries.  E.g. a microbenchmark to access regular
> > > > memory while concurrently reading PCI ROM to trigger memslot deletion
> > > > showed a 5% improvement in throughput.
> > > > 
> > > > Cc: Xiao Guangrong <guangrong.xiao@xxxxxxxxx>
> > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > > > ---
> > > >  arch/x86/kvm/mmu.c | 33 ++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 32 insertions(+), 1 deletion(-)    
> > > 
> > > A number of vfio users are reporting VM instability issues since v5.1,
> > > some have traced it back to this commit 4e103134b862 ("KVM: x86/mmu: Zap
> > > only the relevant pages when removing a memslot"), which I've confirmed
> > > via bisection of the 5.1 merge window KVM pull (636deed6c0bc) and
> > > re-verified on current 5.3-rc4 using the below patch to toggle the
> > > broken behavior.
> > > 
> > > My reproducer is a Windows 10 VM with assigned GeForce GPU running a
> > > variety of tests, including FurMark and PassMark Performance Test.
> > > With the code enabled as exists in upstream currently, PassMark will
> > > generally introduce graphics glitches or hangs.  Sometimes it's
> > > necessary to reboot the VM to see these issues.    
> > 
> > As in, the issue only shows up when the VM is rebooted?  Just want to
> > double check that that's not a typo.  
> 
> No, it can occur on the first boot as well, it's just that the recipe
> to induce a failure is not well understood and manifests itself in
> different ways.  I generally run the tests, then if it still hasn't
> reproduced, I reboot the VM a couple times, running a couple apps in
> between to try to trigger/notice bad behavior.
> 
> > > Flipping the 0/1 in the below patch appears to resolve the issue.
> > > 
> > > I'd appreciate any insights into further debugging this block of code
> > > so that we can fix this regression.  Thanks,    
> > 
> > If it's not too painful to reproduce, I'd say start by determining whether
> > it's a problem with the basic logic or if the cond_resched_lock() handling
> > is wrong.  I.e. comment/ifdef out this chunk:
> > 
> > 		if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
> > 			kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
> > 			flush = false;
> > 			cond_resched_lock(&kvm->mmu_lock);
> > 		}  
> 
> If anything, removing this chunk seems to make things worse.

Could it be something with the gfn test:

                        if (sp->gfn != gfn)
                                continue;

If I remove it, I can't trigger the misbehavior.  If I log it, I only
get hits on VM boot/reboot and some of the gfns look suspiciously like
they could be the assigned GPU BARs and maybe MSI mappings:

               (sp->gfn) != (gfn)
[   71.829450] gfn fec00 != c02c4
[   71.835554] gfn ffe00 != c046f
[   71.841664] gfn 0 != c0720
[   71.847084] gfn 0 != c0720
[   71.852489] gfn 0 != c0720
[   71.857899] gfn 0 != c0720
[   71.863306] gfn 0 != c0720
[   71.868717] gfn 0 != c0720
[   71.874122] gfn 0 != c0720
[   71.879531] gfn 0 != c0720
[   71.884937] gfn 0 != c0720
[   71.890349] gfn 0 != c0720
[   71.895757] gfn 0 != c0720
[   71.901163] gfn 0 != c0720
[   71.906569] gfn 0 != c0720
[   71.911980] gfn 0 != c0720
[   71.917387] gfn 0 != c0720
[   71.922808] gfn fee00 != c0edc
[   71.928915] gfn fee00 != c0edc
[   71.935018] gfn fee00 != c0edc
[   71.941730] gfn c1000 != 8002d7
[   71.948039] gfn 80000 != 8006d5
[   71.954328] gfn 80000 != 8006d5
[   71.960600] gfn 80000 != 8006d5
[   71.966874] gfn 80000 != 8006d5
[   71.992272] gfn 0 != c0720
[   71.997683] gfn 0 != c0720
[   72.003725] gfn 80000 != 8006d5
[   72.044333] gfn 0 != c0720
[   72.049743] gfn 0 != c0720
[   72.055846] gfn 80000 != 8006d5
[   72.177341] gfn ffe00 != c046f
[   72.183453] gfn 0 != c0720
[   72.188864] gfn 0 != c0720
[   72.194290] gfn 0 != c0720
[   72.200308] gfn 80000 != 8006d5
[   82.539023] gfn fec00 != c02c4
[   82.545142] gfn 40000 != c0377
[   82.551343] gfn ffe00 != c046f
[   82.557466] gfn 100000 != c066f
[   82.563839] gfn 800000 != c06ec
[   82.570133] gfn 800000 != c06ec
[   82.576408] gfn 0 != c0720
[   82.581850] gfn 0 != c0720
[   82.587275] gfn 0 != c0720
[   82.592685] gfn 0 != c0720
[   82.598131] gfn 0 != c0720
[   82.603552] gfn 0 != c0720
[   82.608978] gfn 0 != c0720
[   82.614419] gfn 0 != c0720
[   82.619844] gfn 0 != c0720
[   82.625291] gfn 0 != c0720
[   82.630791] gfn 0 != c0720
[   82.636208] gfn 0 != c0720
[   82.641635] gfn 80a000 != c085e
[   82.647929] gfn fee00 != c0edc
[   82.654062] gfn fee00 != c0edc
[   82.660504] gfn 100000 != c066f
[   82.666800] gfn 0 != c0720
[   82.672211] gfn 0 != c0720
[   82.677635] gfn 0 != c0720
[   82.683060] gfn 0 != c0720
[   82.689209] gfn c1000 != 8002d7
[   82.695501] gfn 80000 != 8006d5
[   82.701796] gfn 80000 != 8006d5
[   82.708092] gfn 100000 != 80099b
[   82.714547] gfn 0 != 800a4c
[   82.720154] gfn 0 != 800a4c
[   82.725752] gfn 0 != 800a4c
[   82.731370] gfn 0 != 800a4c
[   82.738705] gfn 100000 != 80099b
[   82.745201] gfn 0 != 800a4c
[   82.750793] gfn 0 != 800a4c
[   82.756381] gfn 0 != 800a4c
[   82.761979] gfn 0 != 800a4c
[   82.768122] gfn 100000 != 8083a4
[   82.774605] gfn 0 != 8094aa
[   82.780196] gfn 0 != 8094aa
[   82.785796] gfn 0 != 8094aa
[   82.791412] gfn 0 != 8094aa
[   82.797523] gfn 100000 != 8083a4
[   82.803977] gfn 0 != 8094aa
[   82.809576] gfn 0 != 8094aa
[   82.815193] gfn 0 != 8094aa
[   82.820809] gfn 0 != 8094aa

(GPU has a BAR mapped at 0x80000000)

Is this gfn optimization correct?  Overzealous?  Doesn't account
correctly for something about MMIO mappings?  Thanks,

Alex