Re: [PATCH v2 3/3] KVM: arm64: Perform memory fault exits when stage-2 handler EFAULTs

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 13 Aug 2024 07:26:55 -0700

On Mon, Aug 12, 2024, Aneesh Kumar K.V wrote:
> Anish Moorthy <amoorthy@xxxxxxxxxx> writes:
> 
> > Right now userspace just gets a bare EFAULT when the stage-2 fault
> > handler fails to fault in the relevant page. Set up a
> > KVM_EXIT_MEMORY_FAULT whenever this happens, which at the very least
> > eases debugging and might also let userspace decide on/take some
> > specific action other than crashing the VM.
> >
> > In some cases, user_mem_abort() EFAULTs before the size of the fault is
> > calculated: return 0 in these cases to indicate that the fault is of
> > unknown size.
> >
> 
> VMMs are now converting private memory to shared or vice-versa on vcpu
> exit due to memory fault. This change will require VMM track each page's
> private/shared state so that they can now handle an exit fault on a
> shared memory where the fault happened due to reasons other than
> conversion.

I don't see how filling kvm_run.memory_fault in more locations changes anything.
The userspace exits are inherently racy, e.g. userspace may have already converted
the page to the appropriate state, thus making KVM's exit spurious.  So either
the VMM already tracks state, or the VMM blindly converts to shared/private.

> Should we make it easy by adding additional flag bits to
> indicate the fault was due to attribute and access type mismatch?

Like above, describing _why_ an exit occurred is problematic when an exit races
with a "fix" from userspace.  It's also problematic when there are multiple
possible faults, e.g. if the guest attempts to write to private memory, but
userspace has the memory mapped as read-only, shared (contrived, but possible).
Describing only the fault that KVM's see means the vCPU will encounter multiple
faults, and userspace will end up getting multiple exits

Instead, KVM should describe the access that led to the fault, as planned in the
original series[1][2].  Userpace can then get the page into the correct state
straightaway, or take punitive action if the guest is misbehaving.

	if (is_write)
		vcpu->run->memory_fault.flags |= KVM_MEMORY_FAULT_FLAG_WRITE;
	else if (is_exec)
		vcpu->run->memory_fault.flags |= KVM_MEMORY_FAULT_FLAG_EXEC;
	else
		vcpu->run->memory_fault.flags |= KVM_MEMORY_FAULT_FLAG_READ;

That said, I'm a little hesitant to capture RWX information without a use case,
mainly because it will require a new capability for userspace to be able to rely
on the information.  In hindsight, it probably would have been better to capture
RWX information in the initial implementation.  Doh.

[1] https://lore.kernel.org/all/ZIn6VQSebTRN1jtX@xxxxxxxxxx
[2] https://lore.kernel.org/all/ZR4N8cwzTMDanPUY@xxxxxxxxxx