Async PF [1] allows to run other processes on a vCPU while the host handles a stage-2 fault caused by a process on that vCPU. When using VM-exit-based stage-2 fault handling [2], async PF functionality is lost because KVM does not run the vCPU while a fault is being handled so no other process can execute on the vCPU. This patch series extends VM-exit-based stage-2 fault handling with async PF support by letting userspace handle faults instead of the kernel, hence the "async PF user" name. I circulated the idea with Paolo, Sean, David H, and James H at the LPC, and the only concern I heard was about injecting the "page not present" event via #PF exception in the CoCo case, where it may not work. In my implementation, I reused the existing code for doing that, so the async PF user implementation is on par with the present async PF implementation in this regard, and support for the CoCo case can be added separately. Please note that this series is applied on top of the VM-exit-based stage-2 fault handling RFC [2]. Implementation The following workflow is implemented: - A process in the guest causes a stage-2 fault. - KVM checks whether the fault can be handled asynchronously. If it can, KVM prepares the VM exit info that contains a newly added "async PF flag" raised and an async PF token value corresponding to the fault. - Userspace reads the VM exit info and resumes the vCPU immediately. Meanwhile it processes the fault. - When the fault is resolved, userspace calls a new async ioctl using the token to notify KVM. - KVM communicates to the guest that the process can be resumed. Notes: - No changes to the x86 async PF PV interface are required - The series does not introduce new dependencies on x86 compared to the existing async PF Testing Inspired by [3], I built a Firecracker-based setup, where Firecracker implemented the VM-exit-based fault handling. I observed that a workload consisting of a CPU-bound and memory-bound threads running concurrently was executing faster with async PF user enabled: with 10 ms-long fault processing, it was 26% faster. It is difficult to provide an objective performance comparison between async PF kernel and async PF user, because async PF user can only work with VM-exit-based fault handling, which has its own performance characteristics compared to in-kernel fault handling or UserfaultFD. The patch series is built on top of the VM-exit-based stage-2 fault handling RFC [2]. Patch 1 updates documentation to reflect [2] changes. Patches 2-6 add the implementation of async PF user. Questions: - Are there any general concerns about the approach? - Can we leave the CoCo use case aside for now, or do we need to support it straight away? - What is the desired level of coupling between async PF and async PF user? For now, I kept the coupling to the bare minimum (only the PV-related data structure is shared between the two). [1] https://kvm-forum.qemu.org/2021/sdei_apf_for_arm64_gavin.pdf [2] https://lore.kernel.org/kvm/CADrL8HUHRMwUPhr7jLLBgD9YLFAnVHc=N-C=8er-x6GUtV97pQ@xxxxxxxxxxxxxx/T/ [3] https://lore.kernel.org/all/20200508032919.52147-1-gshan@xxxxxxxxxx/ Nikita Nikita Kalyazin (6): Documentation: KVM: add userfault KVM exit flag Documentation: KVM: add async pf user doc KVM: x86: add async ioctl support KVM: trace events: add type argument to async pf KVM: x86: async_pf_user: add infrastructure KVM: x86: async_pf_user: hook to fault handling and add ioctl Documentation/virt/kvm/api.rst | 35 ++++++ arch/x86/include/asm/kvm_host.h | 12 +- arch/x86/kvm/Kconfig | 7 ++ arch/x86/kvm/lapic.c | 2 + arch/x86/kvm/mmu/mmu.c | 68 ++++++++++- arch/x86/kvm/x86.c | 101 +++++++++++++++- arch/x86/kvm/x86.h | 2 + include/linux/kvm_host.h | 30 +++++ include/linux/kvm_types.h | 1 + include/trace/events/kvm.h | 50 +++++--- include/uapi/linux/kvm.h | 12 +- virt/kvm/Kconfig | 3 + virt/kvm/Makefile.kvm | 1 + virt/kvm/async_pf.c | 2 +- virt/kvm/async_pf_user.c | 197 ++++++++++++++++++++++++++++++++ virt/kvm/async_pf_user.h | 24 ++++ virt/kvm/kvm_main.c | 14 +++ 17 files changed, 535 insertions(+), 26 deletions(-) create mode 100644 virt/kvm/async_pf_user.c create mode 100644 virt/kvm/async_pf_user.h base-commit: 15f01813426bf9672e2b24a5bac7b861c25de53b -- 2.40.1