On Mon, Oct 16, 2023, David Matlack wrote: > On Tue, Oct 10, 2023 at 4:40 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > On Tue, Oct 10, 2023, David Matlack wrote: > > > On Fri, Sep 8, 2023 at 3:30 PM Anish Moorthy <amoorthy@xxxxxxxxxx> wrote: > > > > > > > > +:: > > > > + union { > > > > + /* KVM_SPEC_EXIT_MEMORY_FAULT */ > > > > + struct { > > > > + __u64 flags; > > > > + __u64 gpa; > > > > + __u64 len; /* in bytes */ > > > > > > I wonder if `gpa` and `len` should just be replaced with `gfn`. > > > > > > - We don't seem to care about returning an exact `gpa` out to > > > userspace since this series just returns gpa = gfn * PAGE_SIZE out to > > > userspace. > > > - The len we return seems kind of arbitrary. PAGE_SIZE on x86 and > > > vma_pagesize on ARM64. But at the end of the day we're not asking the > > > kernel to fault in any specific length of mapping. We're just asking > > > for gfn-to-pfn for a specific gfn. > > > - I'm not sure userspace will want to do anything with this information. > > > > Extending ABI is tricky. E.g. if a use case comes along that needs/wants to > > return a range, then we'd need to add a flag and also update userspace to actually > > do the right thing. > > > > The page fault path doesn't need such information because hardware gives a very > > precise faulting address. But if we ever get to a point where KVM provides info > > for uaccess failures, then we'll likely want to provide the range. E.g. if a > > uaccess splits a page, on x86, we'd either need to register our own exception > > fixup and use custom uaccess macros (eww), or convice the world that extending > > ex_handler_uaccess() and all of the uaccess macros that they need to provide the > > exact address that failed. > > I wonder if userspace might need a precise fault address in some > situations? e.g. If KVM returns -HWPOISON for an access that spans a > page boundary, userspace won't know which is poisoned. As things currently stand, the -EHWPOISON case is guaranteed to be precise because uaccess failures only ever return -EFAULT. The resulting BUS_MCEERR_AR from the kernel's #MC handler will provide the necessary precision to userspace. Though even if -EHWPOISON were imprecise, userspace should be able to figure out which page is poisoned, e.g. by probing each possible page (gross, but doable). Ah, and a much more concrete reason to report gpa+len is that it's possible that KVM may someday support faults at sub-page granularity, e.g. if something like HEKI[*] wants to use Intel's Sub-Page Write Permissions to make a minimal amount of guest code writable when the guest kernel is doing code patching. > Maybe SNP/TDX need precise fault addresses as well? I don't know enough about > how SNP and TDX plan to use this UAPI. FWIW, SNP and TDX usage are limited to the KVM page fault path, i.e. always do precise, single-page reporting. [*] https://lore.kernel.org/all/20230505152046.6575-1-mic@xxxxxxxxxxx