On Tue, Mar 28, 2023, Anish Moorthy wrote: > On Tue, Mar 21, 2023 at 12:43 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > On Tue, Mar 21, 2023, Anish Moorthy wrote: > > > On Tue, Mar 21, 2023 at 8:21 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > FWIW, I completely agree that filling KVM_EXIT_MEMORY_FAULT without guaranteeing > > > > that KVM "immediately" exits to userspace isn't ideal, but given the amount of > > > > historical code that we need to deal with, it seems like the lesser of all evils. > > > > Unless I'm misunderstanding the use cases, unnecessarily filling kvm_run is a far > > > > better failure mode than KVM not filling kvm_run when it should, i.e. false > > > > positives are ok, false negatives are fatal. > > > > > > Don't you have this in reverse? > > > > No, I don't think so. > > > > > False negatives will just result in userspace not having useful extra > > > information for the -EFAULT it receives from KVM_RUN, in which case userspace > > > can do what you mentioned all VMMs do today and just terminate the VM. > > > > And that is _really_ bad behavior if we have any hope of userspace actually being > > able to rely on this functionality. E.g. any false negative when userspace is > > trying to do postcopy demand paging will be fatal to the VM. > > > > > Whereas a false positive might cause a double-write to the KVM_RUN struct, > > > either putting incorrect information in kvm_run.memory_fault or > > > > Recording unused information on -EFAULT in kvm_run doesn't make the information > > incorrect. > > Let's say that some function (converted to annotate its EFAULTs) fills > in kvm_run.memory_fault, but the EFAULT is suppressed from being > returned from kvm_run. What if, later within the same kvm_run call, > some other function (which we've completely overlooked) EFAULTs and > that return value actually does make it out to kvm_run? Userspace > would get stale information, which could be catastrophic. "catastrophic" is a bit hyperbolic. Yes, it would be bad, but at _worst_ userspace will kill the VM, which is the status quo today. > Actually even performing the annotations only in functions that > currently always bubble EFAULTs to userspace still seems brittle: if > new callers are ever added which don't bubble the EFAULTs, then we end > up in the same situation. Because of KVM's semi-magical '1 == resume, -errno/0 == exit' "design", that's true for literally every exit to userspace in KVM and every VM-Exit handler. E.g. see commit 2368048bf5c2 ("KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)"), where KVM returned '-1' instead of '1' when rejecting MSR accesses and inadvertantly killed the VM. A similar bug would be if KVM returned EFAULT instead of -EFAULT, in which case vcpu_run() would resume the guest instead of exiting to userspace and likely put the vCPU into an infinite loop. Do I want to harden KVM to make things like this less brittle? Absolutely. Do I think we should hold up this functionality just because it doesn't solve all of pre-existing flaws in the related KVM code? No.