On Fri, Mar 17, 2023, Oliver Upton wrote: > On Wed, Mar 15, 2023 at 02:17:33AM +0000, Anish Moorthy wrote: > > Add documentation, memslot flags, useful helper functions, and the > > actual new capability itself. > > > > Memory fault exits on absent mappings are particularly useful for > > userfaultfd-based live migration postcopy. When many vCPUs fault upon a > > single userfaultfd the faults can take a while to surface to userspace > > due to having to contend for uffd wait queue locks. Bypassing the uffd > > entirely by triggering a vCPU exit avoids this contention and can improve > > the fault rate by as much as 10x. > > --- > > Documentation/virt/kvm/api.rst | 37 +++++++++++++++++++++++++++++++--- > > include/linux/kvm_host.h | 6 ++++++ > > include/uapi/linux/kvm.h | 3 +++ > > tools/include/uapi/linux/kvm.h | 2 ++ > > virt/kvm/kvm_main.c | 7 ++++++- > > 5 files changed, 51 insertions(+), 4 deletions(-) > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index f9ca18bbec879..4932c0f62eb3d 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -1312,6 +1312,7 @@ yet and must be cleared on entry. > > /* for kvm_userspace_memory_region::flags */ > > #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) > > #define KVM_MEM_READONLY (1UL << 1) > > + #define KVM_MEM_ABSENT_MAPPING_FAULT (1UL << 2) > > call it KVM_MEM_EXIT_ABSENT_MAPPING Ooh, look, a bikeshed! :-) I don't think it should have "EXIT" in the name. The exit to userspace is a side effect, e.g. KVM already exits to userspace on unresolved userfaults. The only thing this knob _directly_ controls is whether or not KVM attempts the slow path. If we give the flag a name like "exit on absent userspace mappings", then KVM will appear to do the wrong thing when KVM exits on a truly absent userspace mapping. And as I argued in the last version[*], I am _strongly_ opposed to KVM speculating on why KVM is exiting to userspace. I.e. KVM should not set a special flag if the memslot has "fast only" behavior. The only thing the flag should do is control whether or not KVM tries slow paths, what KVM does in response to an unresolved fault should be an orthogonal thing. E.g. If KVM encounters an unmapped page while prefetching SPTEs, KVM will (correctly) not exit to userspace and instead simply terminate the prefetch. Obviously we could solve that through documentation, but I don't see any benefit in making this more complex than it needs to be. [*] https://lkml.kernel.org/r/Y%2B0RYMfw6pHrSLX4%40google.com > > +7.35 KVM_CAP_MEMORY_FAULT_NOWAIT > > +-------------------------------- > > + > > +:Architectures: x86, arm64 > > +:Returns: -EINVAL. > > + > > +The presence of this capability indicates that userspace may pass the > > +KVM_MEM_ABSENT_MAPPING_FAULT flag to KVM_SET_USER_MEMORY_REGION to cause KVM_RUN > > +to exit to populate 'kvm_run.memory_fault' and exit to userspace (*) in response > > +to page faults for which the userspace page tables do not contain present > > +mappings. Attempting to enable the capability directly will fail. > > + > > +The 'gpa' and 'len' fields of kvm_run.memory_fault will be set to the starting > > +address and length (in bytes) of the faulting page. 'flags' will be set to > > +KVM_MEMFAULT_REASON_ABSENT_MAPPING. > > + > > +Userspace should determine how best to make the mapping present, then take > > +appropriate action. For instance, in the case of absent mappings this might > > +involve establishing the mapping for the first time via UFFDIO_COPY/CONTINUE or > > +faulting the mapping in using MADV_POPULATE_READ/WRITE. After establishing the > > +mapping, userspace can return to KVM to retry the previous memory access. > > + > > +(*) NOTE: On x86, KVM_CAP_X86_MEMORY_FAULT_EXIT must be enabled for the > > +KVM_MEMFAULT_REASON_ABSENT_MAPPING_reason: otherwise userspace will only receive > > +a -EFAULT from KVM_RUN without any useful information. > > I'm not a fan of this architecture-specific dependency. Userspace is already > explicitly opting in to this behavior by way of the memslot flag. These sort > of exits are entirely orthogonal to the -EFAULT conversion earlier in the > series. Ya, yet another reason not to speculate on why KVM wasn't able to resolve a fault.