Re: [WIP Patch v2 04/14] KVM: x86: Add KVM_CAP_X86_MEMORY_FAULT_EXIT and associated kvm_run field

Anish Moorthy <amoorthy@xxxxxxxxxx> · Mon, 20 Mar 2023 11:19:35 -0700

On Mon, Mar 20, 2023 at 8:53 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Fri, Mar 17, 2023, Anish Moorthy wrote:
> > On Fri, Mar 17, 2023 at 2:50 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > I wonder if we can get away with returning -EFAULT, but still filling vcpu->run
> > > with KVM_EXIT_MEMORY_FAULT and all the other metadata.  That would likely simplify
> > > the implementation greatly, and would let KVM fill vcpu->run unconditonally.  KVM
> > > would still need a capability to advertise support to userspace, but userspace
> > > wouldn't need to opt in.  I think this may have been my very original though, and
> > > I just never actually wrote it down...
> >
> > Oh, good to know that's actually an option. I thought of that too, but
> > assumed that returning a negative error code was a no-go for a proper
> > vCPU exit. But if that's not true then I think it's the obvious
> > solution because it precludes any uncaught behavior-change bugs.
> >
> > A couple of notes
> > 1. Since we'll likely miss some -EFAULT returns, we'll need to make
> > sure that the user can check for / doesn't see a stale
> > kvm_run::memory_fault field when a missed -EFAULT makes it to
> > userspace. It's a small and easy-to-fix detail, but I thought I'd
> > point it out.
>
> Ya, this is the main concern for me as well.  I'm not as confident that it's
> easy-to-fix/avoid though.
>
> > 2. I don't think this would simplify the series that much, since we
> > still need to find the call sites returning -EFAULT to userspace and
> > populate memory_fault only in those spots to avoid populating it for
> > -EFAULTs which don't make it to userspace.
>
> Filling kvm_run::memory_fault even if KVM never exits to userspace is perfectly
> ok.  It's not ideal, but it's ok.

Right- I was just pointing out that doing so could mislead readers of
the code if they assume that kvm_run::memory_fault is populated iff it
was going to be associated w/ an exit to userspace," which I know I
would.

> > We *could* relax that condition and just document that memory_fault should be
> > ignored when KVM_RUN does not return -EFAULT... but I don't think that's a
> > good solution from a coder/maintainer perspective.
>
> You've got things backward.  memory_fault _must_ be ignored if KVM doesn't return
> the associated "magic combo", where the magic value is either "0+KVM_EXIT_MEMORY_FAULT"
> or "-EFAULT+KVM_EXIT_MEMORY_FAULT".

I think we're saying the same thing- I was using "should" to mean "must."

> Filling kvm_run::memory_fault but not exiting to userspace is ok because userspace
> never sees the data, i.e. userspace is completely unaware.  This behavior is not
> ideal from a KVM perspective as allowing KVM to fill the kvm_run union without
> exiting to userspace can lead to other bugs, e.g. effective corruption of the
> kvm_run union

Ooh, I didn't think of the corruption issue here: thanks for pointing it out.

> but at least from a uABI perspective, the behavior is acceptable.

This does complicate things for KVM implementation though, right? In
particular, we'd have to make sure that KVM_RUN never conditionally
modifies its return value/exit reason based on reads from kvm_run:
that seems like a slightly weird thing to do, but I don't want to
assume anything here.

Anyways, unless that's not (and never will be) a problem, allowing
corruption of kvm_run seems very risky.

> The reverse, userspace consuming kvm_run::memory_fault without being explicitly
> told the data is valid, is not ok/safe.  KVM's contract is that fields contained
> in kvm_run's big union are valid if and only if KVM returns '0' and the associated
> exit reason is set in kvm_run::exit_reason.
>
> From an ABI perspective, I don't see anything fundamentally wrong with bending
> that rule slightly by saying that kvm_run::memory_fault is valid if KVM returns
> -EFAULT+KVM_EXIT_MEMORY_FAULT.  It won't break existing userspace that is unaware
> of KVM_EXIT_MEMORY_FAULT, and userspace can precisely check for the combination.
>
> My big concern with piggybacking -EFAULT is that userspace will be fed stale if
> KVM exits with -EFAULT in a patch that _doesn't_ fill kvm_run::memory_fault.
> Returning a negative error code isn't hazardous in and of itself, e.g. KVM has
> had bugs in the past where KVM returns '0' but doesn't fill kvm_run::exit_reason.
> The big danger is that KVM has existing paths that return -EFAULT, i.e. we can
> introduce bugs simply by doing nothing, whereas returning '0' would largely be
> limited to new code.
>
> The counter-argument is that propagating '0' correctly up the stack carries its
> own risk due to plenty of code correctly treating '0' as "success" and not "exit
> to userspace".
>
> And we can mitigate the risk of using -EFAULT.  E.g. fill in kvm_run::memory_fault
> even if we are 99.9999% confident the -EFAULT can't get out to userspace in the
> context of KVM_RUN, and set kvm_run::exit_reason to some arbitrary value at the
> start of KVM_RUN to prevent reusing memory_fault from a previous userspace exit.

Right, this is what I had in mind when I called this "small and
easy-to-fix." Piggybacking -EFAULT seems like the right thing to do to
me, but I'm still uneasy about possibly corrupting kvm_run for masked
-EFAULTS.