Re: [PATCH v7 08/14] KVM: arm64: Enable KVM_CAP_MEMORY_FAULT_INFO and annotate fault in the stage-2 fault handler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 04, 2024 at 02:49:07PM -0800, Sean Christopherson wrote:

[...]

> The presense of MTE stuff shouldn't affect the fundamental access information,

  "When FEAT_MTE is implemented, for a synchronous Data Abort on an
  instruction that directly accesses Allocation Tags, ISV is 0."

If there is no instruction syndrome, there's insufficient fault context
to determine if the guest was doing a read or a write.

> e.g. if the guest was attempting to write, then KVM should set KVM_MEMORY_EXIT_FLAG_WRITE
> irrespective of whether or not MTE is in play.

When the MMU generates such an abort, it *is not* read, write, or execute.
It is a NoTagAccess fault. There is no sane way to describe this in
terms of RWX.

> The one thing we may want to squeak in before 6.8 is released is a placeholder
> in memory_fault, though I don't think that's strictly necessary since the union
> as a whole is padded to 256 bytes.  I suppose userspace could allocate based on
> sizeof(kvm_run.memory_fault), but that's a bit of a stretch.

Strictly speaking, that isn't ABI any more, but compile-time brittleness
to header changes. IOW, old userspace could still run on new kernel b/c
it compiled against the old structure size and only knows about the
fields present at that time.

> > > E.g. on the x86 side, KVM intentionally sets reserved bits in SPTEs for
> > > "caching" emulated MMIO accesses, and the resulting fault captures the
> > > "reserved bits set" information in register state.  But that's purely an
> > > (optional) imlementation detail of KVM that should never be exposed to
> > > userspace.
> > 
> > MMIO accesses would show up elsewhere though, right?
> 
> Yes, but I don't see how that's relevant.  Maybe I'm just misunderstanding what
> you're saying/asking.

If "reserved" EPT violations found their way to userspace via the
"memory fault" exit structure then that'd likely be due to a KVM bug.
The only expected flows in the near term are this and CoCo crap.

> > Either way, I have no issues whatsoever if the direction for x86 is to
> > provide abstracted fault information.
> 
> I don't understand how ARM can get away with NOT providing a layer of abstraction.
> Copying fault state verbatim to userspace will bleed KVM implementation details
> into userspace,

The memslot flag already bleeds KVM implementation detail into userspace
to a degree. The event we're trying to let userspace handle is at the
intersection of a specific hardware/software state.

> Abstracting gory hardware details from userspace is one of the main roles of the
> kernel.

Where it can be accomplished without a loss (or misrepresentation) of
information, agreed. But KVM UAPI is so architecture-specific that it
seems arbitrary to draw the line here.

> A concrete example of hardware throwing a wrench in things is AMD's upcoming
> "encrypted" flag (in the stage-2 page fault error code), which is set by SNP-capable
> CPUs for *any* VM that supports guest-controlled encrypted memory.  If KVM reported
> the page fault error code directly to userspace, then running the same VM on
> different hardware generations, e.g. after live migration, would generate different
> error codes.
>  
> Are we talking past each other?  I'm genuinely confused by the pushback on
> capturing RWX information.  Yes, the RWX info may be insufficient in some cases,
> but its existence doesn't preclude KVM from providing more information as needed.

My pushback isn't exactly on RWX (even though I noted the MTE quirk
above). What I'm poking at here is the general infrastructure for
reflecting faults into userspace, which is aggressively becoming more
relevant.

-- 
Thanks,
Oliver




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux