On Thu, Aug 15, 2019 at 12:16:07PM -0600, Alex Williamson wrote: > On Thu, 15 Aug 2019 09:00:06 -0700 > Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > If I print out the memslot base_gfn, it seems pretty evident that only > the assigned device mappings are triggering this branch. The base_gfns > exclusively include: > > 0x800000 > 0x808000 > 0xc0089 > > Where the first two clearly match the 64bit BARs and the last is the > result of a page that we need to emulate within the BAR @0xc0000000 at > offset 0x88000, so the base_gfn is the remaining direct mapping. That's consistent with my understanding of userspace, e.g. normal memory regions aren't deleted until the VM is shut down (barring hot unplug). > I don't know if this implies we're doing something wrong for assigned > device slots, but maybe a more targeted workaround would be if we could > specifically identify these slots, though there's no special > registration of them versus other slots. What is triggering the memslot removal/update? Is it possible that whatever action is occuring is modifying multiple memslots? E.g. KVM's memslot-only zapping is allowing the guest to access stale entries for the unzapped-but-related memslots, whereas the full zap does not. FYI, my VFIO/GPU/PCI knowledge is abysmal, please speak up if any of my ideas are nonsensical. > Did you have any non-device > assignment test cases that took this branch when developing the series? The primary testing was performance oriented, using a slightly modified version of a synthetic benchmark[1] from a previous series[2] that touched the memslot flushing flow. From a functional perspective, I highly doubt that test would have been able expose an improper zapping bug. We do have some amount of coverage via kvm-unit-tests, as an EPT test was triggering a slab bug due not actually zapping the collected SPTEs[3]. [1] http://lkml.iu.edu/hypermail/linux/kernel/1305.2/00277/mmtest.tar.bz2 [2] https://lkml.kernel.org/r/1368706673-8530-1-git-send-email-xiaoguangrong@xxxxxxxxxxxxxxxxxx [3] https://patchwork.kernel.org/patch/10899283/ > > One other thought would be to force a call to kvm_flush_remote_tlbs(kvm), > > e.g. set flush=true just before the final kvm_mmu_remote_flush_or_zap(). > > Maybe it's a case where there are no SPTEs for the memslot, but the TLB > > flush is needed for some reason. > > This doesn't work. Thanks, > > Alex