On Fri, Aug 09, 2024, Anish Moorthy wrote: > The initial paragraph of the documentation here makes it sound like a > KVM_EXIT_MEMORY_FAULT will always accompany an EFAULT from KVM_RUN, but > that's not a guarantee. > > Also, define zero to be a special value for the "size" field. This > allows memory faults exits to be set up in spots where KVM_RUN must > EFAULT, but is not able to supply an accurate size. > > Signed-off-by: Anish Moorthy <amoorthy@xxxxxxxxxx> > --- > Documentation/virt/kvm/api.rst | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 8e5dad80b337..c5ce7944005c 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -7073,7 +7073,8 @@ spec refer, https://github.com/riscv/riscv-sbi-doc. > > KVM_EXIT_MEMORY_FAULT indicates the vCPU has encountered a memory fault that > could not be resolved by KVM. The 'gpa' and 'size' (in bytes) describe the > -guest physical address range [gpa, gpa + size) of the fault. The 'flags' field > +guest physical address range [gpa, gpa + size) of the fault: when zero, it > +indicates that the size of the fault could not be determined. The 'flags' field > describes properties of the faulting access that are likely pertinent: > > - KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred > @@ -8131,7 +8132,7 @@ unavailable to host or other VMs. > :Architectures: x86 > :Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP. > > -The presence of this capability indicates that KVM_RUN will fill > +The presence of this capability indicates that KVM_RUN *may* fill I would prefer to fix KVM than to change the documentation. The "will fill" is specifically scoped to guest page fault VM-Exits, so it should be a fully solvable problem. I don't want to leave wriggle room for KVM, because then it will be quite difficult for userspace to do anything useful with memory_fault. E.g. for x86, convert all -EFAULTs that are returned when KVM is hosed to -EIO and KVM_BUG_ON, and then there's only one -EFAULT that doesn't fill memory_fault. Completely untested... --- arch/x86/kvm/mmu/mmu.c | 13 +++++++------ arch/x86/kvm/mmu/paging_tmpl.h | 4 ++-- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 928cf84778b0..cb4e3a1041ed 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3225,8 +3225,8 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) fault->req_level >= it.level); } - if (WARN_ON_ONCE(it.level != fault->goal_level)) - return -EFAULT; + if (KVM_BUG_ON(it.level != fault->goal_level, vcpu->kvm)) + return -EIO; ret = mmu_set_spte(vcpu, fault->slot, it.sptep, ACC_ALL, base_gfn, fault->pfn, fault); @@ -3264,6 +3264,7 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa return RET_PF_RETRY; } + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); return -EFAULT; } @@ -4597,8 +4598,8 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, #ifndef CONFIG_X86_64 /* A 64-bit CR2 should be impossible on 32-bit KVM. */ - if (WARN_ON_ONCE(fault_address >> 32)) - return -EFAULT; + if (KVM_BUG_ON(fault_address >> 32, vcpu->kvm)) + return -EIO; #endif /* * Legacy #PF exception only have a 32-bit error code. Simply drop the @@ -5988,8 +5989,8 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err r = RET_PF_INVALID; if (unlikely(error_code & PFERR_RSVD_MASK)) { - if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS)) - return -EFAULT; + if (KVM_BUG_ON(error_code & PFERR_PRIVATE_ACCESS, vcpu->kvm)) + return -EIO; r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct); if (r == RET_PF_EMULATE) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 69941cebb3a8..4f4704c65c40 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -745,8 +745,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, fault->req_level >= it.level); } - if (WARN_ON_ONCE(it.level != fault->goal_level)) - return -EFAULT; + if (KVM_BUG_ON(it.level != fault->goal_level, vcpu->kvm)) + return -EIO; ret = mmu_set_spte(vcpu, fault->slot, it.sptep, gw->pte_access, base_gfn, fault->pfn, fault); base-commit: 12ac7b9981ff30f0deffe6331bb742c71b279300 --