On Tue, Apr 4, 2023 at 12:35 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > Let's say that some function (converted to annotate its EFAULTs) fills > > in kvm_run.memory_fault, but the EFAULT is suppressed from being > > returned from kvm_run. What if, later within the same kvm_run call, > > some other function (which we've completely overlooked) EFAULTs and > > that return value actually does make it out to kvm_run? Userspace > > would get stale information, which could be catastrophic. > > "catastrophic" is a bit hyperbolic. Yes, it would be bad, but at _worst_ userspace > will kill the VM, which is the status quo today. Well what I'm saying is that in these cases userspace *wouldn't know* that kvm_run.memory_fault contains incorrect information for the -EFAULT it actually got (do you disagree?), which could presumably cause it to do bad things like "resolve" faults on incorrect pages and/or infinite-loop on KVM_RUN, etc. Annotating the efault information as valid only from the call sites which return directly to userspace prevents this class of problem, at the cost of allowing un-annotated EFAULTs to make it to userspace. But to me, paying that cost to make sure the EFAULT information is always correct seems by far preferable to not paying it and allowing userspace to get silently incorrect information. > > Actually even performing the annotations only in functions that > > currently always bubble EFAULTs to userspace still seems brittle: if > > new callers are ever added which don't bubble the EFAULTs, then we end > > up in the same situation. > > Because of KVM's semi-magical '1 == resume, -errno/0 == exit' "design", that's > true for literally every exit to userspace in KVM and every VM-Exit handler. > E.g. see commit 2368048bf5c2 ("KVM: x86: Signal #GP, not -EPERM, on bad > WRMSR(MCi_CTL/STATUS)"), where KVM returned '-1' instead of '1' when rejecting > MSR accesses and inadvertantly killed the VM. A similar bug would be if KVM > returned EFAULT instead of -EFAULT, in which case vcpu_run() would resume the > guest instead of exiting to userspace and likely put the vCPU into an infinite > loop. Right, good point.