Re: [PATCH v5 04/17] KVM: Add KVM_CAP_MEMORY_FAULT_INFO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 16, 2023, David Matlack wrote:
> On Tue, Oct 10, 2023 at 4:40 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Tue, Oct 10, 2023, David Matlack wrote:
> > > On Fri, Sep 8, 2023 at 3:30 PM Anish Moorthy <amoorthy@xxxxxxxxxx> wrote:
> > > >
> > > > +::
> > > > +       union {
> > > > +               /* KVM_SPEC_EXIT_MEMORY_FAULT */
> > > > +               struct {
> > > > +                       __u64 flags;
> > > > +                       __u64 gpa;
> > > > +                       __u64 len; /* in bytes */
> > >
> > > I wonder if `gpa` and `len` should just be replaced with `gfn`.
> > >
> > > - We don't seem to care about returning an exact `gpa` out to
> > > userspace since this series just returns gpa = gfn * PAGE_SIZE out to
> > > userspace.
> > > - The len we return seems kind of arbitrary. PAGE_SIZE on x86 and
> > > vma_pagesize on ARM64. But at the end of the day we're not asking the
> > > kernel to fault in any specific length of mapping. We're just asking
> > > for gfn-to-pfn for a specific gfn.
> > > - I'm not sure userspace will want to do anything with this information.
> >
> > Extending ABI is tricky.  E.g. if a use case comes along that needs/wants to
> > return a range, then we'd need to add a flag and also update userspace to actually
> > do the right thing.
> >
> > The page fault path doesn't need such information because hardware gives a very
> > precise faulting address.  But if we ever get to a point where KVM provides info
> > for uaccess failures, then we'll likely want to provide the range.  E.g. if a
> > uaccess splits a page, on x86, we'd either need to register our own exception
> > fixup and use custom uaccess macros (eww), or convice the world that extending
> > ex_handler_uaccess() and all of the uaccess macros that they need to provide the
> > exact address that failed.
> 
> I wonder if userspace might need a precise fault address in some
> situations? e.g. If KVM returns -HWPOISON for an access that spans a
> page boundary, userspace won't know which is poisoned.

As things currently stand, the -EHWPOISON case is guaranteed to be precise because
uaccess failures only ever return -EFAULT.  The resulting BUS_MCEERR_AR from the
kernel's #MC handler will provide the necessary precision to userspace.

Though even if -EHWPOISON were imprecise, userspace should be able to figure out
which page is poisoned, e.g. by probing each possible page (gross, but doable).

Ah, and a much more concrete reason to report gpa+len is that it's possible that
KVM may someday support faults at sub-page granularity, e.g. if something like
HEKI[*] wants to use Intel's Sub-Page Write Permissions to make a minimal amount
of guest code writable when the guest kernel is doing code patching.

> Maybe SNP/TDX need precise fault addresses as well? I don't know enough about
> how SNP and TDX plan to use this UAPI.

FWIW, SNP and TDX usage are limited to the KVM page fault path, i.e. always do
precise, single-page reporting.

[*] https://lore.kernel.org/all/20230505152046.6575-1-mic@xxxxxxxxxxx




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux