On 13/08/19 22:19, Sean Christopherson wrote: > Yes? Shadow pages are stored in a hash table, for_each_valid_sp() walks > all entries for a given gfn. The sp->gfn check is there to skip entries > that hashed to the same list but for a completely different gfn. > > Skipping the gfn check would be sort of a lightweight zap all in the > sense that it would zap shadow pages that happend to collide with the > target memslot/gfn but are otherwise unrelated. > > What happens if you give just the GPU BAR at 0x80000000 a pass, i.e.: > > if (sp->gfn != gfn && sp->gfn != 0x80000) > continue; > > If that doesn't work, it might be worth trying other gfns to see if you > can pinpoint which sp is being zapped as collateral damage. > > It's possible there is a pre-existing bug somewhere else that was being > hidden because KVM was effectively zapping all SPTEs during (re)boot, > and the hash collision is also hiding the bug by zapping the stale entry. > > Of course it's also possible this code is wrong, :-) Also, can you reproduce it with one vCPU? This could (though not really 100%) distinguish a missing invalidation from a race condition. Do we even need the call to slot_handle_all_level? The rmap update should be done already by kvm_mmu_prepare_zap_page (via kvm_mmu_page_unlink_children -> mmu_page_zap_pte -> drop_spte). Alex, can you replace it with just "flush = false;"? Thanks, Paolo