On Tue, Apr 04, 2023, Anish Moorthy wrote: > On Tue, Apr 4, 2023 at 12:35 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > Let's say that some function (converted to annotate its EFAULTs) fills > > > in kvm_run.memory_fault, but the EFAULT is suppressed from being > > > returned from kvm_run. What if, later within the same kvm_run call, > > > some other function (which we've completely overlooked) EFAULTs and > > > that return value actually does make it out to kvm_run? Userspace > > > would get stale information, which could be catastrophic. > > > > "catastrophic" is a bit hyperbolic. Yes, it would be bad, but at _worst_ userspace > > will kill the VM, which is the status quo today. > > Well what I'm saying is that in these cases userspace *wouldn't know* > that kvm_run.memory_fault contains incorrect information for the > -EFAULT it actually got (do you disagree?), I disagree in the sense that if the stale information causes a problem, then by definition userspace has to know. It's the whole "if a tree falls in a forest" thing. If KVM reports stale information and literally nothing bad happens, ever, then is the superfluous exit really a problem? Not saying it wouldn't be treated as a bug, just that it might not even warrant a stable backport if the worst case scenario is a spurious exit to userspace (for example). > which could presumably cause it to do bad things like "resolve" faults on > incorrect pages and/or infinite-loop on KVM_RUN, etc. Putting the vCPU into an infinite loop is _very_ visible, e.g. see the entire mess surrounding commit 31c25585695a ("Revert "KVM: SVM: avoid infinite loop on NPF from bad address""). As above, fixing pages that don't need to be fixed isn't itself a major problem. If the extra exits lead to a performance issue, then _that_ is a problem, but again _something_ has to detect the problem and thus it becomes a known thing. > Annotating the efault information as valid only from the call sites > which return directly to userspace prevents this class of problem, at > the cost of allowing un-annotated EFAULTs to make it to userspace. But > to me, paying that cost to make sure the EFAULT information is always > correct seems by far preferable to not paying it and allowing > userspace to get silently incorrect information. I don't think that's a maintainable approach. Filling kvm_run if and only if the -EFAULT has a direct path to userspace is (a) going to require a signficant amount of code churn and (b) falls apart the instant code further up the stack changes. E.g. the relatively straightforward page fault case requires bouncing through 7+ functions to get from kvm_handle_error_pfn() to kvm_arch_vcpu_ioctl_run(), and not all of those are obviously "direct" if (IS_ENABLED(CONFIG_RETPOLINE) && fault.is_tdp) r = kvm_tdp_page_fault(vcpu, &fault); else r = vcpu->arch.mmu->page_fault(vcpu, &fault); if (fault.write_fault_to_shadow_pgtable && emulation_type) *emulation_type |= EMULTYPE_WRITE_PF_TO_SP; /* * Similar to above, prefetch faults aren't truly spurious, and the * async #PF path doesn't do emulation. Do count faults that are fixed * by the async #PF handler though, otherwise they'll never be counted. */ if (r == RET_PF_FIXED) vcpu->stat.pf_fixed++; else if (prefetch) ; else if (r == RET_PF_EMULATE) vcpu->stat.pf_emulate++; else if (r == RET_PF_SPURIOUS) vcpu->stat.pf_spurious++; return r; ... if (r == RET_PF_INVALID) { r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, lower_32_bits(error_code), false, &emulation_type); if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm)) return -EIO; } if (r < 0) return r; if (r != RET_PF_EMULATE) return 1; In other words, the "only if it's direct" rule requires visually auditing changes, i.e. catching "violations" via code review, not only to code that adds a new -EFAULT return, but to all code throughout rather large swaths of KVM. The odds of us (or whoever the future maintainers/reviewers are) remembering to enforce the "rule", let alone actually having 100% accuracy, are basically nil. On the flip side, if we add a helper to fill kvm_run and return -EFAULT, then we can add rule that only time KVM is allowed to return a bare -EFAULT is immediately after a uaccess, i.e. after copy_to/from_user() and the many variants. And _that_ can be enforced through static checkers, e.g. someone with more (read: any) awk/sed skills than me could bang something out in a matter of minutes. Such a static checker won't catch everything, but there would be very, very few bare non-uaccess -EFAULTS left, and those could be filtered out with an allowlist, e.g. similar to how the folks that run smatch and whatnot deal with false positives.