Re: [PATCH v4 03/16] KVM: Add KVM_CAP_MEMORY_FAULT_INFO

Anish Moorthy <amoorthy@xxxxxxxxxx> · Mon, 5 Jun 2023 10:46:03 -0700

By the way, I'd like to solicit opinions on the checks that
kvm_populate_efault_info is performing: specifically the the
exit-reason-unset check

On Fri, Jun 2, 2023 at 9:19 AM Anish Moorthy <amoorthy@xxxxxxxxxx> wrote:
>
> +inline void kvm_populate_efault_info(struct kvm_vcpu *vcpu,
> +                                    uint64_t gpa, uint64_t len, uint64_t flags)
> +{
>  ...
> +       else if (WARN_ON_ONCE(vcpu->run->exit_reason != KVM_EXIT_UNKNOWN))
> +               goto out;

What I intended this check to guard against was the first problematic
case (A) I called out in the cover letter

> The implementation strategy for KVM_CAP_MEMORY_FAULT_INFO has risks: for
> example, if there are any existing paths in KVM_RUN which cause a vCPU
> to (1) populate the kvm_run struct then (2) fail a vCPU guest memory
> access but ignore the failure and then (3) complete the exit to
> userspace set up in (1), then the contents of the kvm_run struct written
> in (1) will be corrupted.

What I wrote was actually incorrect, as you may see: if in (1) the
exit reason != KVM_EXIT_UNKNOWN then the exit-reason-unset check will
prevent writing to the run struct. Now, if for some reason this flow
involved populating most of the run struct in (1) but only setting the
exit reason in (3) then we'd still have a problem: but it's not
feasible to anticipate everything after all :)

I also mentioned a different error case (B)

> Another example: if KVM_RUN fails a guest memory access for which the
> EFAULT is annotated but does not return the EFAULT to userspace, then
> later returns an *un*annotated EFAULT to userspace, then userspace will
> receive incorrect information.

When the second EFAULT is un-annotated the presence/absence of the
exit-reason-unset check is irrelevant: userspace will observe an
annotated EFAULT in place of an un-annotated one either way.

There's also a third interesting case (C) which I didn't mention: an
annotated EFAULT which is ignored/suppressed followed by one which is
propagated to userspace. Here the exit-reason-unset check will prevent
the second annotation from being written, so userspace sees an
annotation with bad contents, If we believe that case (A) is a weird
sequence of events that shouldn't be happening in the first place,
then case (C) seems more important to ensure correctness in. But I
don't know anything about how often (A) happens in KVM, which is why I
want others' opinions.

So, should we drop the exit-reason-unset check (and the accompanying
patch 4) and treat existing occurrences of case (A) as bugs, or should
we maintain the check at the cost of incorrect behavior in case (C)?
Or is there another option here entirely?

Sean, I remember you calling out that some of the emulated mmio code
follows the pattern in (A), but it's been a while and my memory is
fuzzy. What's your opinion here?