On Wed, Jan 23, 2019 at 12:07:19PM +0100, Cédric Le Goater wrote: > On 1/23/19 11:30 AM, Paul Mackerras wrote: > > On Wed, Jan 23, 2019 at 05:45:24PM +1100, Benjamin Herrenschmidt wrote: > >> On Tue, 2019-01-22 at 16:26 +1100, Paul Mackerras wrote: > >>> On Mon, Jan 07, 2019 at 08:10:05PM +0100, Cédric Le Goater wrote: > >>>> Clear the ESB pages from the VMA of the IRQ being pass through to the > >>>> guest and let the fault handler repopulate the VMA when the ESB pages > >>>> are accessed for an EOI or for a trigger. > >>> > >>> Why do we want to do this? > >>> > >>> I don't see any possible advantage to removing the PTEs from the > >>> userspace mapping. You'll need to explain further. > >> > >> Afaik bcs we change the mapping to point to the real HW irq ESB page > >> instead of the "IPI" that was there at VM init time. > > yes exactly. You need to clean up the pages each time. > > > So that makes it sound like there is a whole lot going on that hasn't > > even been hinted at in the patch descriptions... It sounds like we > > need a good description of how all this works and fits together > > somewhere under Documentation/. > > OK. I have started doing so for the models merged in QEMU but not yet > for KVM. I will work on it. > > > In any case we need much more informative patch descriptions. I > > realize that it's all currently in Cedric's head, but I bet that in > > two or three years' time when we come to try to debug something, it > > won't be in anyone's head... > > I agree. > > > So, storing the ESB VMA under the KVM device is not shocking anyone ? Actually, now that I think of it, why can't userspace (QEMU) manage this using mmap()? Based on what Ben has said, I assume there would be a pair of pages for each interrupt that a PCI pass-through device has. Would we end up with too many VMAs if we just used mmap() to change the mappings from the software-generated pages to the hardware-generated interrupt pages? Are the necessary pages for a PCI passthrough device contiguous in both host real space and guest real space? If so we'd only need one mmap() for all the device's interrupt pages. Paul.