On Fri, Jun 02, 2023, Anish Moorthy wrote: > Implement KVM_CAP_MEMORY_FAULT_INFO for efaults generated by > kvm_handle_error_pfn(). > > Signed-off-by: Anish Moorthy <amoorthy@xxxxxxxxxx> > --- > arch/x86/kvm/mmu/mmu.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index c8961f45e3b1..cb71aae9aaec 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3291,6 +3291,10 @@ static void kvm_send_hwpoison_signal(struct kvm_memory_slot *slot, gfn_t gfn) > > static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > { > + uint64_t rounded_gfn; gfn_t, and probably no need to specificy "rounded", let the code do the talking. > + uint64_t fault_size; > + uint64_t fault_flags; With a few exceptions that we should kill off, KVM uses "u64", not "uint64_t". Though arguably the "size" should be gfn_t too. And these can all go on a single line, e.g. u64 fault_size, fault_flags; Though since the kvm_run.memory_fault field and the param to the helper are "len", a simple "len" here is better IMO. And since this is not remotely performance sensitive, I wouldn't bother deferring the initialization, e.g. gfn_t gfn = gfn_round_for_level(fault->gfn, fault->goal_level); gfn_t len = KVM_HPAGE_SIZE(fault->goal_level); u64 fault_flags; All that said, consuming fault->goal_level is unnecessary, and not be coincidence. The *only* time KVM should bail to userspace is if KVM failed to faultin a 4KiB page. Providing a hugepage is done opportunistically, it's never a hard requirement. So throw away all of the above and see below. > + > if (is_sigpending_pfn(fault->pfn)) { > kvm_handle_signal_exit(vcpu); > return -EINTR; > @@ -3309,6 +3313,15 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa > return RET_PF_RETRY; > } > > + fault_size = KVM_HPAGE_SIZE(fault->goal_level); > + rounded_gfn = round_down(fault->gfn * PAGE_SIZE, fault_size); We have a helper for this too, gfn_round_for_level(). Ooh, but this isn't storing a gfn, it's storing a gpa. Naughty, naughty. > + > + fault_flags = 0; > + if (fault->write) > + fault_flags |= KVM_MEMORY_FAULT_FLAG_WRITE; > + if (fault->exec) exec and write are mutually exclusive. There's even documented precedent for this in page_fault_can_be_fast(). So with a READ flag, this can be if (fault->write) fault_flags = KVM_MEMORY_FAULT_FLAG_WRITE; else if (fault->exec) fault_flags = KVM_MEMORY_FAULT_FLAG_EXEC; else fault_flags = KVM_MEMORY_FAULT_FLAG_READ; Or as Paolo would probably write it ;-) fault_flags = (fault->write & 1) << KVM_MEMORY_FAULT_FLAG_WRITE_SHIFT | (fault->exec & 1) << KVM_MEMORY_FAULT_FLAG_EXEC_SHIFT | (!fault->write && !fault->exec) << KVM_MEMORY_FAULT_FLAG_READ_SHIFT; (that was a joke, don't actually do that) > + fault_flags |= KVM_MEMORY_FAULT_FLAG_EXEC; > + kvm_populate_efault_info(vcpu, rounded_gfn, fault_size, fault_flags); This is where passing a "gfn" variable as a "gpa" looks broken. > return -EFAULT; All in all, something like this? u64 fault_flags; <other error handling> /* comment goes here */ WARN_ON_ONCE(fault->goal_level != PG_LEVEL_4K); if (fault->write) fault_flags = KVM_MEMORY_FAULT_FLAG_WRITE; else if (fault->exec) fault_flags = KVM_MEMORY_FAULT_FLAG_EXEC; else fault_flags = KVM_MEMORY_FAULT_FLAG_READ; kvm_handle_blahblahblah_fault(vcpu, gfn_to_gpa(fault->gfn), PAGE_SIZE, fault_flags);