Re: [WIP Patch v2 04/14] KVM: x86: Add KVM_CAP_X86_MEMORY_FAULT_EXIT and associated kvm_run field

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 22, 2023, Anish Moorthy wrote:
> On Tue, Mar 21, 2023 at 12:43 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Tue, Mar 21, 2023, Anish Moorthy wrote:
> > > > FWIW, I completely agree that filling KVM_EXIT_MEMORY_FAULT without guaranteeing
> > > > that KVM "immediately" exits to userspace isn't ideal, but given the amount of
> > > > historical code that we need to deal with, it seems like the lesser of all evils.
> > > > Unless I'm misunderstanding the use cases, unnecessarily filling kvm_run is a far
> > > > better failure mode than KVM not filling kvm_run when it should, i.e. false
> > > > positives are ok, false negatives are fatal.
> > >
> > > Don't you have this in reverse?
> >
> > No, I don't think so.
> >
> > > False negatives will just result in userspace not having useful extra
> > > information for the -EFAULT it receives from KVM_RUN, in which case userspace
> > > can do what you mentioned all VMMs do today and just terminate the VM.
> >
> > And that is _really_ bad behavior if we have any hope of userspace actually being
> > able to rely on this functionality.  E.g. any false negative when userspace is
> > trying to do postcopy demand paging will be fatal to the VM.
> 
> But since -EFAULTs from KVM_RUN today are already fatal, so there's no
> new failure introduced by an -EFAULT w/o a populated memory_fault
> field right?

Yes, but it's a bit of a moot piont since the goal of the feature is to avoid
killing the VM.

> Obviously that's of no real use to userspace, but that seems like part of the
> point of starting with a partial conversion: to allow for filling holes in
> the implementation in the future.

Yes, but I want a forcing function to reveal any holes we missed sooner than
later, otherwise the feature will languish since it won't be useful beyond the
fast-gup-only use case.

> It seems like what you're really concerned about here is the interaction with
> the memslot fast-gup-only flag. Obviously, failing to populate
> kvm_run.memory_fault for new userspace-visible -EFAULTs caused by that flag
> would cause new fatal failures for the guest, which would make the feature
> actually harmful. But as far as I know (and please lmk if I'm wrong), the
> memslot flag only needs to be used by the kvm_handle_error_pfn (x86) and
> user_mem_abort (arm64) functions, meaning that those are the only places
> where we need to check/populate kvm_run.memory_fault for new
> userspace-visible -EFAULTs.

No.  As you point out, the fast-gup-only case should be pretty easy to get correct,
i.e. this should all work just fine for _GCE's current_ use case.  I'm more concerned
with setting KVM up for success when future use cases come along that might not be ok
with unhandled faults in random guest accesses killing the VM.

To be clear, I do not expect us to get this 100% correct on the first attempt,
but I do want to have mechanisms in place that will detect any bugs/misses so
that we can fix the issues _before_ a use case comes along that needs 100%
accuracy.

> > > Whereas a false positive might cause a double-write to the KVM_RUN struct,
> > > either putting incorrect information in kvm_run.memory_fault or
> >
> > Recording unused information on -EFAULT in kvm_run doesn't make the information
> > incorrect.
> >
> > > corrupting another member of the union.
> >
> > Only if KVM accesses guest memory after initiating an exit to userspace, which
> > would be a KVM irrespective of kvm_run.memory_fault.
> 
> Ah good: I was concerned that this was a valid set of code paths in
> KVM. Although I'm assuming that "initiating an exit to userspace"
> includes the "returning -EFAULT from KVM_RUN" cases, because we
> wouldn't want EFAULTs to stomp on each other as well (the
> kvm_mmu_do_page_fault usages were supposed to be one such example,
> though I'm glad to know that they're not a problem).

This one gets into a bit of a grey area.  The "rule" is really about the intent,
i.e. once KVM intends to exit to userspace, it's a bug if KVM encounters something
else and runs into the weeds.

In no small part because of the myriad paths where KVM ignores what be fatal errors
in most flows, e.g. record_steal_time(), simply returning -EFAULT from some low
level helper doesn't necessarily signal an intent to exit all the way to userspace.

To be honest, I don't have a clear idea of how difficult it will be to detect bugs.
In most cases, failure to exit to userspace leads to a fatal error fairly quickly.
With userspace faults, it's entirely possible that an exit could be missed and
nothing bad would happen.

Hmm, one idea would be to have the initial -EFAULT detection fill kvm_run.memory_fault,
but set kvm_run.exit_reason to some magic number, e.g. zero it out.  Then KVM could
WARN if something tries to overwrite kvm_run.exit_reason.  The WARN would need to
be buried by a Kconfig or something since kvm_run can be modified by userspace,
but other than that I think it would work.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux