Re: [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot

Alex Williamson <alex.williamson@xxxxxxxxxx> · Thu, 15 Aug 2019 09:23:24 -0600

On Tue, 13 Aug 2019 13:19:14 -0700
Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:

> On Tue, Aug 13, 2019 at 01:33:16PM -0600, Alex Williamson wrote:
> > On Tue, 13 Aug 2019 11:57:37 -0600
> > Alex Williamson <alex.williamson@xxxxxxxxxx> wrote:  
> 
> > Could it be something with the gfn test:
> > 
> >                         if (sp->gfn != gfn)
> >                                 continue;
> > 
> > If I remove it, I can't trigger the misbehavior.  If I log it, I only
> > get hits on VM boot/reboot and some of the gfns look suspiciously like
> > they could be the assigned GPU BARs and maybe MSI mappings:
> > 
> >                (sp->gfn) != (gfn)  
> 
> Hits at boot/reboot makes sense, memslots get zapped when userspace
> removes a memory region/slot, e.g. remaps BARs and whatnot.
> 
> ...
>  
> > Is this gfn optimization correct?  Overzealous?  Doesn't account
> > correctly for something about MMIO mappings?  Thanks,  
> 
> Yes?  Shadow pages are stored in a hash table, for_each_valid_sp() walks
> all entries for a given gfn.  The sp->gfn check is there to skip entries
> that hashed to the same list but for a completely different gfn.
> 
> Skipping the gfn check would be sort of a lightweight zap all in the
> sense that it would zap shadow pages that happend to collide with the
> target memslot/gfn but are otherwise unrelated.
> 
> What happens if you give just the GPU BAR at 0x80000000 a pass, i.e.:
> 
> 	if (sp->gfn != gfn && sp->gfn != 0x80000)
> 		continue;
> 
> If that doesn't work, it might be worth trying other gfns to see if you
> can pinpoint which sp is being zapped as collateral damage.
> 
> It's possible there is a pre-existing bug somewhere else that was being
> hidden because KVM was effectively zapping all SPTEs during (re)boot,
> and the hash collision is also hiding the bug by zapping the stale entry.
> 
> Of course it's also possible this code is wrong, :-)

Ok, fun day of trying to figure out which ranges are relevant, I've
narrowed it down to all of these:

0xffe00
0xfee00
0xfec00
0xc1000
0x80a000
0x800000
0x100000

ie. I can effective only say that sp->gfn values of 0x0, 0x40000, and
0x80000 can take the continue branch without seeing bad behavior in the
VM.

The assigned GPU has BARs at GPAs:

0xc0000000-0xc0ffffff
0x800000000-0x808000000
0x808000000-0x809ffffff

And the assigned companion audio function is at GPA:

0xc1080000-0xc1083fff

Only one of those seems to align very well with a gfn base involved
here.  The virtio ethernet has an mmio range at GPA 0x80a000000,
otherwise I don't find any other I/O devices coincident with the gfns
above.

I'm running the VM with 2MB hugepages, but I believe the issue still
occurs with standard pages.  When run with standard pages I see more
hits to gfn values 0, 0x40000, 0x80000, but the same number of hits to
the set above that cannot take the continue branch.  I don't know if
that means anything.

Any further ideas what to look for?  Thanks,

Alex

PS - I see the posted workaround patch, I'll test that in the interim.