On Tue, 13 Aug 2019 22:37:14 +0200 Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > On 13/08/19 22:19, Sean Christopherson wrote: > > Yes? Shadow pages are stored in a hash table, for_each_valid_sp() walks > > all entries for a given gfn. The sp->gfn check is there to skip entries > > that hashed to the same list but for a completely different gfn. > > > > Skipping the gfn check would be sort of a lightweight zap all in the > > sense that it would zap shadow pages that happend to collide with the > > target memslot/gfn but are otherwise unrelated. > > > > What happens if you give just the GPU BAR at 0x80000000 a pass, i.e.: > > > > if (sp->gfn != gfn && sp->gfn != 0x80000) > > continue; Not having any luck with this yet. Tried 0x80000, 0x8xxxxx, 0. > > If that doesn't work, it might be worth trying other gfns to see if you > > can pinpoint which sp is being zapped as collateral damage. > > > > It's possible there is a pre-existing bug somewhere else that was being > > hidden because KVM was effectively zapping all SPTEs during (re)boot, > > and the hash collision is also hiding the bug by zapping the stale entry. > > > > Of course it's also possible this code is wrong, :-) > > Also, can you reproduce it with one vCPU? This could (though not really > 100%) distinguish a missing invalidation from a race condition. That's a pretty big change, I'll give it a shot, but not sure how conclusive it would be. > Do we even need the call to slot_handle_all_level? The rmap update > should be done already by kvm_mmu_prepare_zap_page (via > kvm_mmu_page_unlink_children -> mmu_page_zap_pte -> drop_spte). > > Alex, can you replace it with just "flush = false;"? Replace the continue w/ flush = false? I'm not clear on this suggestion. Thanks, Alex