On Tue, Oct 10, 2023, David Matlack wrote: > On Fri, Sep 8, 2023 at 3:30 PM Anish Moorthy <amoorthy@xxxxxxxxxx> wrote: > > > > KVM_CAP_MEMORY_FAULT_INFO allows kvm_run to return useful information > > besides a return value of -1 and errno of EFAULT when a vCPU fails an > > access to guest memory which may be resolvable by userspace. > > > > Add documentation, updates to the KVM headers, and a helper function > > (kvm_handle_guest_uaccess_fault()) for implementing the capability. > > > > Mark KVM_CAP_MEMORY_FAULT_INFO as available on arm64 and x86, even > > though EFAULT annotation are currently totally absent. Picking a point > > to declare the implementation "done" is difficult because > > > > 1. Annotations will be performed incrementally in subsequent commits > > across both core and arch-specific KVM. > > 2. The initial series will very likely miss some cases which need > > annotation. Although these omissions are to be fixed in the future, > > userspace thus still needs to expect and be able to handle > > unannotated EFAULTs. > > > > Given these qualifications, just marking it available here seems the > > least arbitrary thing to do. > > > > Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx> > > Signed-off-by: Anish Moorthy <amoorthy@xxxxxxxxxx> > > --- > [...] > > +:: > > + union { > > + /* KVM_SPEC_EXIT_MEMORY_FAULT */ > > + struct { > > + __u64 flags; > > + __u64 gpa; > > + __u64 len; /* in bytes */ > > I wonder if `gpa` and `len` should just be replaced with `gfn`. > > - We don't seem to care about returning an exact `gpa` out to > userspace since this series just returns gpa = gfn * PAGE_SIZE out to > userspace. > - The len we return seems kind of arbitrary. PAGE_SIZE on x86 and > vma_pagesize on ARM64. But at the end of the day we're not asking the > kernel to fault in any specific length of mapping. We're just asking > for gfn-to-pfn for a specific gfn. > - I'm not sure userspace will want to do anything with this information. Extending ABI is tricky. E.g. if a use case comes along that needs/wants to return a range, then we'd need to add a flag and also update userspace to actually do the right thing. The page fault path doesn't need such information because hardware gives a very precise faulting address. But if we ever get to a point where KVM provides info for uaccess failures, then we'll likely want to provide the range. E.g. if a uaccess splits a page, on x86, we'd either need to register our own exception fixup and use custom uaccess macros (eww), or convice the world that extending ex_handler_uaccess() and all of the uaccess macros that they need to provide the exact address that failed. And for SNP and TDX, I believe the range will be used when the guest uses a hardware-vendor-defined hypercall to request conversions between private and shared. Or maybe the plan is to funnel those into KVM_HC_MAP_GPA_RANGE?