On Thu, 15 Aug 2019 12:25:31 -0700 Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > On Thu, Aug 15, 2019 at 12:16:07PM -0600, Alex Williamson wrote: > > On Thu, 15 Aug 2019 09:00:06 -0700 > > Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > > > If I print out the memslot base_gfn, it seems pretty evident that only > > the assigned device mappings are triggering this branch. The base_gfns > > exclusively include: > > > > 0x800000 > > 0x808000 > > 0xc0089 > > > > Where the first two clearly match the 64bit BARs and the last is the > > result of a page that we need to emulate within the BAR @0xc0000000 at > > offset 0x88000, so the base_gfn is the remaining direct mapping. > > That's consistent with my understanding of userspace, e.g. normal memory > regions aren't deleted until the VM is shut down (barring hot unplug). > > > I don't know if this implies we're doing something wrong for assigned > > device slots, but maybe a more targeted workaround would be if we could > > specifically identify these slots, though there's no special > > registration of them versus other slots. > > What is triggering the memslot removal/update? Is it possible that > whatever action is occuring is modifying multiple memslots? E.g. KVM's > memslot-only zapping is allowing the guest to access stale entries for > the unzapped-but-related memslots, whereas the full zap does not. > > FYI, my VFIO/GPU/PCI knowledge is abysmal, please speak up if any of my > ideas are nonsensical. The memory bit in the PCI command register of config space for each device controls whether the device decoders are active for the MMIO BAR ranges. These get flipped as both the guest firmware and guest OS enumerate and assign resources to the PCI subsystem. Generally these are not further manipulated while the guest OS is running except for hotplug operations. The guest OS device driver will generally perform the final enable of these ranges and they'll remain enabled until the guest is rebooted. I recall somewhere in this thread you referenced reading the ROM as part of the performance testing of this series. The ROM has it's own enable bit within the ROM BAR as the PCI spec allows devices to share decoders between the standard BARs and the ROM BAR. Enabling and disabling the enable bit in the ROM BAR should be very similar in memslot behavior to the overall memory enable bit for the other BARs within the device. Note that often the faults that I'm seeing occur a long time after BAR mappings are finalized, usually (not always) the VM boots to a fully functional desktop and it's only as I run various tests do the glitches start to appear. For instance, when I allowed sp->gfn 0xfec00 to take the continue branch, I got an OpenCL error. For either 0xffee00 or 0xc1000 I got graphics glitches, for example stray geometric artifacts flashed on the screen. For 0x100000 and 0x800000 I'd get a black screen or blank 3D graphics window. For 0x80a000 the VM froze (apparently). I can't say whether each of these is a consistent failure mode, I only tested to the point of determining whether a range generates an error. > > Did you have any non-device > > assignment test cases that took this branch when developing the series? > > The primary testing was performance oriented, using a slightly modified > version of a synthetic benchmark[1] from a previous series[2] that touched > the memslot flushing flow. From a functional perspective, I highly doubt > that test would have been able expose an improper zapping bug. :-\ It seems like there's probably some sort of inflection point where it becomes faster to zap all pages versus the overhead of walking every page in a memory slot, was that evaluated? Not sure if that's relevant here, but curious. Thanks, Alex