By the way, I'd like to solicit opinions on the checks that kvm_populate_efault_info is performing: specifically the the exit-reason-unset check On Fri, Jun 2, 2023 at 9:19 AM Anish Moorthy <amoorthy@xxxxxxxxxx> wrote: > > +inline void kvm_populate_efault_info(struct kvm_vcpu *vcpu, > + uint64_t gpa, uint64_t len, uint64_t flags) > +{ > ... > + else if (WARN_ON_ONCE(vcpu->run->exit_reason != KVM_EXIT_UNKNOWN)) > + goto out; What I intended this check to guard against was the first problematic case (A) I called out in the cover letter > The implementation strategy for KVM_CAP_MEMORY_FAULT_INFO has risks: for > example, if there are any existing paths in KVM_RUN which cause a vCPU > to (1) populate the kvm_run struct then (2) fail a vCPU guest memory > access but ignore the failure and then (3) complete the exit to > userspace set up in (1), then the contents of the kvm_run struct written > in (1) will be corrupted. What I wrote was actually incorrect, as you may see: if in (1) the exit reason != KVM_EXIT_UNKNOWN then the exit-reason-unset check will prevent writing to the run struct. Now, if for some reason this flow involved populating most of the run struct in (1) but only setting the exit reason in (3) then we'd still have a problem: but it's not feasible to anticipate everything after all :) I also mentioned a different error case (B) > Another example: if KVM_RUN fails a guest memory access for which the > EFAULT is annotated but does not return the EFAULT to userspace, then > later returns an *un*annotated EFAULT to userspace, then userspace will > receive incorrect information. When the second EFAULT is un-annotated the presence/absence of the exit-reason-unset check is irrelevant: userspace will observe an annotated EFAULT in place of an un-annotated one either way. There's also a third interesting case (C) which I didn't mention: an annotated EFAULT which is ignored/suppressed followed by one which is propagated to userspace. Here the exit-reason-unset check will prevent the second annotation from being written, so userspace sees an annotation with bad contents, If we believe that case (A) is a weird sequence of events that shouldn't be happening in the first place, then case (C) seems more important to ensure correctness in. But I don't know anything about how often (A) happens in KVM, which is why I want others' opinions. So, should we drop the exit-reason-unset check (and the accompanying patch 4) and treat existing occurrences of case (A) as bugs, or should we maintain the check at the cost of incorrect behavior in case (C)? Or is there another option here entirely? Sean, I remember you calling out that some of the emulated mmio code follows the pattern in (A), but it's been a while and my memory is fuzzy. What's your opinion here?