Re: [WIP Patch v2 04/14] KVM: x86: Add KVM_CAP_X86_MEMORY_FAULT_EXIT and associated kvm_run field

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 4 Apr 2023 12:34:57 -0700

On Tue, Mar 28, 2023, Anish Moorthy wrote:
> On Tue, Mar 21, 2023 at 12:43 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Tue, Mar 21, 2023, Anish Moorthy wrote:
> > > On Tue, Mar 21, 2023 at 8:21 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > > FWIW, I completely agree that filling KVM_EXIT_MEMORY_FAULT without guaranteeing
> > > > that KVM "immediately" exits to userspace isn't ideal, but given the amount of
> > > > historical code that we need to deal with, it seems like the lesser of all evils.
> > > > Unless I'm misunderstanding the use cases, unnecessarily filling kvm_run is a far
> > > > better failure mode than KVM not filling kvm_run when it should, i.e. false
> > > > positives are ok, false negatives are fatal.
> > >
> > > Don't you have this in reverse?
> >
> > No, I don't think so.
> >
> > > False negatives will just result in userspace not having useful extra
> > > information for the -EFAULT it receives from KVM_RUN, in which case userspace
> > > can do what you mentioned all VMMs do today and just terminate the VM.
> >
> > And that is _really_ bad behavior if we have any hope of userspace actually being
> > able to rely on this functionality.  E.g. any false negative when userspace is
> > trying to do postcopy demand paging will be fatal to the VM.
> >
> > > Whereas a false positive might cause a double-write to the KVM_RUN struct,
> > > either putting incorrect information in kvm_run.memory_fault or
> >
> > Recording unused information on -EFAULT in kvm_run doesn't make the information
> > incorrect.
> 
> Let's say that some function (converted to annotate its EFAULTs) fills
> in kvm_run.memory_fault, but the EFAULT is suppressed from being
> returned from kvm_run. What if, later within the same kvm_run call,
> some other function (which we've completely overlooked) EFAULTs and
> that return value actually does make it out to kvm_run? Userspace
> would get stale information, which could be catastrophic.

"catastrophic" is a bit hyperbolic.  Yes, it would be bad, but at _worst_ userspace
will kill the VM, which is the status quo today.

> Actually even performing the annotations only in functions that
> currently always bubble EFAULTs to userspace still seems brittle: if
> new callers are ever added which don't bubble the EFAULTs, then we end
> up in the same situation.

Because of KVM's semi-magical '1 == resume, -errno/0 == exit' "design", that's
true for literally every exit to userspace in KVM and every VM-Exit handler.
E.g. see commit 2368048bf5c2 ("KVM: x86: Signal #GP, not -EPERM, on bad
WRMSR(MCi_CTL/STATUS)"), where KVM returned '-1' instead of '1' when rejecting
MSR accesses and inadvertantly killed the VM.  A similar bug would be if KVM
returned EFAULT instead of -EFAULT, in which case vcpu_run() would resume the
guest instead of exiting to userspace and likely put the vCPU into an infinite
loop.

Do I want to harden KVM to make things like this less brittle?  Absolutely.  Do I
think we should hold up this functionality just because it doesn't solve all of
pre-existing flaws in the related KVM code?  No.