On Wed, Aug 21, 2019 at 01:30:41PM -0700, Sean Christopherson wrote: > On Wed, Aug 21, 2019 at 01:35:04PM -0600, Alex Williamson wrote: > > On Wed, 21 Aug 2019 13:08:59 -0600 > > Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > > > Does this suggests something is still fundamentally wrong with the > > > premise of this change or have I done something stupid? > > > > Seems the latter, particularly your comment that we're looking for > > pages pointing to the gfn range to be removed, not just those in the > > range. Slot gfn ranges like ffe00-ffe1f are getting reduced to 0-0 or > > c0000-c0000, zapping zero or c0000, and I think one of the ones you > > were looking for c1080-c1083 is reduce to c1000-c1000 and therefore > > zaps sp->gfn c1000. I'll keep looking. Thanks, > > Ya. As far as where to look, at this point I don't think it's an issue of > incorrect zapping. Not because I'm 100% confident the zapping logic is > correct, but because many of the tests, e.g. removing 'sp->gfn != gfn' and > not being able to exclude APIC/IOAPIC ranges, suggest that the badness is > 'fixed' by zapping seemingly unrelated sps. > > In other words, it may be fundamentally wrong to zap only the memslot > being removed, but I really want to know why. History isn't helpful as > KVM has always zapped all pages when removing a memslot (on x86), and the > introduction of the per-memslot flush hook in commit > > 2df72e9bc4c5 ("KVM: split kvm_arch_flush_shadow") > > was all about refactoring generic code, and doesn't have any information > on whether per-memslot flushing was actually tried for x86. One semi-random idea would be to zap mmio pages, i.e. don't skip pages for which sp->mmio_cached is true, regardless of their gfn or level. I don't expect it to make a difference, but it would shrink the haystack on the off change it does "fix" the issues.