Re: [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Tue, 13 Aug 2019 13:19:14 -0700

On Tue, Aug 13, 2019 at 01:33:16PM -0600, Alex Williamson wrote:
> On Tue, 13 Aug 2019 11:57:37 -0600
> Alex Williamson <alex.williamson@xxxxxxxxxx> wrote:

> Could it be something with the gfn test:
> 
>                         if (sp->gfn != gfn)
>                                 continue;
> 
> If I remove it, I can't trigger the misbehavior.  If I log it, I only
> get hits on VM boot/reboot and some of the gfns look suspiciously like
> they could be the assigned GPU BARs and maybe MSI mappings:
> 
>                (sp->gfn) != (gfn)

Hits at boot/reboot makes sense, memslots get zapped when userspace
removes a memory region/slot, e.g. remaps BARs and whatnot.

...

> Is this gfn optimization correct?  Overzealous?  Doesn't account
> correctly for something about MMIO mappings?  Thanks,

Yes?  Shadow pages are stored in a hash table, for_each_valid_sp() walks
all entries for a given gfn.  The sp->gfn check is there to skip entries
that hashed to the same list but for a completely different gfn.

Skipping the gfn check would be sort of a lightweight zap all in the
sense that it would zap shadow pages that happend to collide with the
target memslot/gfn but are otherwise unrelated.

What happens if you give just the GPU BAR at 0x80000000 a pass, i.e.:

	if (sp->gfn != gfn && sp->gfn != 0x80000)
		continue;

If that doesn't work, it might be worth trying other gfns to see if you
can pinpoint which sp is being zapped as collateral damage.

It's possible there is a pre-existing bug somewhere else that was being
hidden because KVM was effectively zapping all SPTEs during (re)boot,
and the hash collision is also hiding the bug by zapping the stale entry.

Of course it's also possible this code is wrong, :-)