Re: [PATCH v7 08/14] KVM: arm64: Enable KVM_CAP_MEMORY_FAULT_INFO and annotate fault in the stage-2 fault handler

Sean Christopherson <seanjc@xxxxxxxxxx> · Mon, 4 Mar 2024 12:32:51 -0800

On Mon, Mar 04, 2024, Oliver Upton wrote:
> On Mon, Mar 04, 2024 at 08:00:15PM +0000, Oliver Upton wrote:
> > On Thu, Feb 15, 2024 at 11:53:59PM +0000, Anish Moorthy wrote:
> > 
> > [...]
> > 
> > > +	if (is_error_noslot_pfn(pfn)) {
> > > +		kvm_prepare_memory_fault_exit(vcpu, gfn * PAGE_SIZE, PAGE_SIZE,
> > > +					      write_fault, exec_fault, false);
> > 
> > Hmm... Reinterpreting the fault context into something that wants to be
> > arch-neutral might make this a bit difficult for userspace to
> > understand.
> > 
> > The CPU can take an instruction abort on an S1PTW due to missing write
> > permissions, i.e. hardware cannot write to the stage-1 descriptor for an
> > AF or DBM update. In this case HPFAR points to the IPA of the stage-1
> > descriptor that took the fault, not the target page.
> > 
> > It would seem this gets expressed to userspace as an intent to write and
> > execute on the stage-1 page tables, no?
> 
> Duh, kvm_vcpu_trap_is_exec_fault() (not to be confused with
> kvm_vcpu_trap_is_iabt()) filters for S1PTW, so this *should*
> shake out as a write fault on the stage-1 descriptor.
> 
> With that said, an architecture-neutral UAPI may not be able to capture
> the nuance of a fault. This UAPI will become much more load-bearing in
> the future, and the loss of granularity could become an issue.

What is the possible fallout from loss of granularity/nuance?  E.g. if the worst
case scenario is that KVM may exit to userspace multiple times in order to resolve
the problem, IMO that's an acceptable cost for having "dumb", common uAPI.

The intent/contract of the exit to userspace isn't for userspace to be able to
completely understand what fault occurred, but rather for KVM to communicate what
action userspace needs to take in order for KVM to make forward progress.

> Marc had some ideas about forwarding the register state to userspace
> directly, which should be the right level of information for _any_ fault
> taken to userspace.

I don't know enough about ARM to weigh in on that side of things, but for x86
this definitely doesn't hold true.  E.g. on the x86 side, KVM intentionally sets
reserved bits in SPTEs for "caching" emulated MMIO accesses, and the resulting
fault captures the "reserved bits set" information in register state.  But that's
purely an (optional) imlementation detail of KVM that should never be exposed to
userspace.

Ditto for things like access tracking on hardware without A/D bits, and shadow
paging, which again can generate fault state that is inscrutable/misleading
without context that only KVM knows (and shouldn't expose to userspace).