Re: [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Tue, 13 Aug 2019 22:37:14 +0200

On 13/08/19 22:19, Sean Christopherson wrote:
> Yes?  Shadow pages are stored in a hash table, for_each_valid_sp() walks
> all entries for a given gfn.  The sp->gfn check is there to skip entries
> that hashed to the same list but for a completely different gfn.
> 
> Skipping the gfn check would be sort of a lightweight zap all in the
> sense that it would zap shadow pages that happend to collide with the
> target memslot/gfn but are otherwise unrelated.
> 
> What happens if you give just the GPU BAR at 0x80000000 a pass, i.e.:
> 
> 	if (sp->gfn != gfn && sp->gfn != 0x80000)
> 		continue;
> 
> If that doesn't work, it might be worth trying other gfns to see if you
> can pinpoint which sp is being zapped as collateral damage.
> 
> It's possible there is a pre-existing bug somewhere else that was being
> hidden because KVM was effectively zapping all SPTEs during (re)boot,
> and the hash collision is also hiding the bug by zapping the stale entry.
> 
> Of course it's also possible this code is wrong, :-)

Also, can you reproduce it with one vCPU?  This could (though not really
100%) distinguish a missing invalidation from a race condition.

Do we even need the call to slot_handle_all_level?  The rmap update
should be done already by kvm_mmu_prepare_zap_page (via
kvm_mmu_page_unlink_children -> mmu_page_zap_pte -> drop_spte).

Alex, can you replace it with just "flush = false;"?

Thanks,

Paolo