Sean Christopherson <sean.j.christopherson@xxxxxxxxx> writes: > On Thu, May 14, 2020 at 07:22:50PM -0400, Peter Xu wrote: >> On Thu, May 14, 2020 at 03:56:24PM -0700, Sean Christopherson wrote: >> > On Thu, May 14, 2020 at 06:05:16PM -0400, Peter Xu wrote: >> > > E.g., shm_open() with a handle and fill one 0xff page, then remap it to >> > > anywhere needed in QEMU? >> > >> > Mapping that 4k page over and over is going to get expensive, e.g. each >> > duplicate will need a VMA and a memslot, plus any PTE overhead. If the >> > total sum of the holes is >2mb it'll even overflow the mumber of allowed >> > memslots. >> >> What's the PTE overhead you mentioned? We need to fill PTEs one by one on >> fault even if the page is allocated in the kernel, am I right? > > It won't require host PTEs for every page if it's a kernel page. I doubt > PTEs are a significant overhead, especially compared to memslots, but it's > still worth considering. > > My thought was to skimp on both host PTEs _and_ KVM SPTEs by always sending > the PCI hole accesses down the slow MMIO path[*]. > > [*] https://lkml.kernel.org/r/20200514194624.GB15847@xxxxxxxxxxxxxxx > If we drop 'aggressive' patch from this patchset we can probably get away with KVM_MEM_READONLY and userspace VMAs but this will only help us to save some memory, it won't speed things up. >> 4K is only an example - we can also use more pages as the template. However I >> guess the kvm memslot count could be a limit.. Could I ask what's the normal >> size of this 0xff region, and its distribution? Julia/Michael, could you please provide some 'normal' configuration for a Q35 machine and its PCIe config space? -- Vitaly