Re: [PATCH] KVM: x86/mmu: Add RET_PF_CONTINUE to eliminate bool+int* "returns"

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 15 Apr 2022 19:28:35 +0000

On Fri, Apr 15, 2022, David Matlack wrote:
> On Thu, Apr 14, 2022 at 5:51 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> > index 1bff453f7cbe..c0e502b17ef7 100644
> > --- a/arch/x86/kvm/mmu/mmu_internal.h
> > +++ b/arch/x86/kvm/mmu/mmu_internal.h
> > @@ -143,6 +143,7 @@ unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
> >  /*
> >   * Return values of handle_mmio_page_fault, mmu.page_fault, and fast_page_fault().
> >   *
> > + * RET_PF_CONTINUE: So far, so good, keep handling the page fault.
> >   * RET_PF_RETRY: let CPU fault again on the address.
> >   * RET_PF_EMULATE: mmio page fault, emulate the instruction directly.
> >   * RET_PF_INVALID: the spte is invalid, let the real page fault path update it.
> 
> RET_PF_CONTINUE and RET_PF_INVALID are pretty similar, they both
> indicate to the PF handler that it should keep going. What do you
> think about taking this a step further and removing RET_PF_INVALID and
> RET_PF_CONTINUE and using 0 instead?

RET_PF_INVALID is useful because it allows kvm_mmu_page_fault() to sanity check
that the page fault handler didn't barf:

		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
					  lower_32_bits(error_code), false);
		if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
			return -EIO;

It's a bit odd that fast_page_fault() returns RET_PF_INVALID instead of RET_PF_CONTINUE,
but at the same time the fault _is_ indeed invalid for fast handling.

> That way we can replace:
> 
>   if (ret != RET_PF_CONTINUE)
>           return r;
> 
> and
> 
>   if (ret != RET_PF_INVALID)
>           return r;
> 
> with
> 
>   if (r)
>           return r;
> 
> ?

We could actually do that now, at least for RET_PF_CONTINUE, but I deliberately
avoided doing that so as to not take a hard dependency on the underlying value of
RET_PF_CONTINUE.  KVM's magic "0 == exit to userspace, 1 == resume guest" logic
is dangerous, e.g. it's caused real bugs in the past.  But in that case, the
return value is multiplexed with -errno, i.e. there's a good reason for using
magic values.

For this code, there's no such requirement because the caller must convert the
RET_PF_* return to the aforementioned magic returns.

And the even bigger motivation was to discourage "if (r)" for this case because
that suggests that this code _does_ utilize KVM's magic return value.  I actually
wrote it that way at first and then got completely turned around and forgot what
value was being returned :-)

Heh, and thinking about it, I'd be tempted to assign RET_PF_CONTINUE a non-zero
value if some debug Kconfig is enabled to really enforce that KVM explicitly checks
for RET_PF_CONTINUE instead of assuming a non-zero value means "bail".

> > @@ -151,9 +152,15 @@ unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
> >   *
> >   * Any names added to this enum should be exported to userspace for use in
> >   * tracepoints via TRACE_DEFINE_ENUM() in mmutrace.h
> 
> Please add RET_PF_CONTINUE to mmutrace.h, e.g.
> 
>   TRACE_DEFINE_ENUM(RET_PF_CONTINUE);
> 
> It doesn't matter in practice (yet) since the trace enums are only
> used for trace_fast_page_fault, which does not return RET_PF_CONTINUE.
> But it would be good to keep it up to date.

Drat, missed that.  Thanks!