Re: [PATCH v4] KVM: nVMX: Don't leak L1 MMIO regions to L2

Jim Mattson <jmattson@xxxxxxxxxx> · Tue, 15 Oct 2019 10:13:06 -0700

On Mon, Oct 14, 2019 at 6:07 PM Sean Christopherson
<sean.j.christopherson@xxxxxxxxx> wrote:
>
> On Mon, Oct 14, 2019 at 05:13:04PM -0700, Jim Mattson wrote:
> > If the "virtualize APIC accesses" VM-execution control is set in the
> > VMCS, the APIC virtualization hardware is triggered when a page walk
> > in VMX non-root mode terminates at a PTE wherein the address of the 4k
> > page frame matches the APIC-access address specified in the VMCS. On
> > hardware, the APIC-access address may be any valid 4k-aligned physical
> > address.
> >
> > KVM's nVMX implementation enforces the additional constraint that the
> > APIC-access address specified in the vmcs12 must be backed by
> > a "struct page" in L1. If not, L0 will simply clear the "virtualize
> > APIC accesses" VM-execution control in the vmcs02.
> >
> > The problem with this approach is that the L1 guest has arranged the
> > vmcs12 EPT tables--or shadow page tables, if the "enable EPT"
> > VM-execution control is clear in the vmcs12--so that the L2 guest
> > physical address(es)--or L2 guest linear address(es)--that reference
> > the L2 APIC map to the APIC-access address specified in the
> > vmcs12. Without the "virtualize APIC accesses" VM-execution control in
> > the vmcs02, the APIC accesses in the L2 guest will directly access the
> > APIC-access page in L1.
> >
> > When there is no mapping whatsoever for the APIC-access address in L1,
> > the L2 VM just loses the intended APIC virtualization. However, when
> > the APIC-access address is mapped to an MMIO region in L1, the L2
> > guest gets direct access to the L1 MMIO device. For example, if the
> > APIC-access address specified in the vmcs12 is 0xfee00000, then L2
> > gets direct access to L1's APIC.
> >
> > Since this vmcs12 configuration is something that KVM cannot
> > faithfully emulate, the appropriate response is to exit to userspace
> > with KVM_INTERNAL_ERROR_EMULATION.
> >
> > Fixes: fe3ef05c7572 ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12")
> > Reported-by: Dan Cross <dcross@xxxxxxxxxx>
> > Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx>
> > Reviewed-by: Peter Shier <pshier@xxxxxxxxxx>
> > ---
>
> With two nits below:
>
> Reviewed-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
>
> > @@ -3244,13 +3247,9 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
> >        * the nested entry.
> >        */
> >       vmx->nested.nested_run_pending = 1;
> > -     ret = nested_vmx_enter_non_root_mode(vcpu, true);
> > -     vmx->nested.nested_run_pending = !ret;
> > -     if (ret > 0)
> > -             return 1;
> > -     else if (ret)
> > -             return nested_vmx_failValid(vcpu,
> > -                     VMXERR_ENTRY_INVALID_CONTROL_FIELD);
> > +     status = nested_vmx_enter_non_root_mode(vcpu, true);
> > +     if (unlikely(status != NVMX_VMENTRY_SUCCESS))
>
> KVM doesn't usually add (un)likely annotations for things that are under
> L1's control.  The "unlikely(vmx->fail)" in nested_vmx_exit_reflected() is
> there because it's true iff KVM missed a VM-Fail condition that was caught
> by hardware.

I would argue that it makes sense to optimize for the success path in
this case. If L1 is taking the failure path more frequently than the
success path, something is wrong. Moreover, you have already indicated
that the success path should be statically predicted taken by asking
me to move the failure path out-of-line. (Forward conditional branches
are statically predicted not taken, per section 3.4.1.3 of the Intel
64 and IA-32 Architectures Optimization Reference Manual.) I'm just
asking the compiler not to undo that hint.

> > +             goto vmentry_failed;
> >
> >       /* Hide L1D cache contents from the nested guest.  */
> >       vmx->vcpu.arch.l1tf_flush_l1d = true;
> > @@ -3281,6 +3280,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
> >               return kvm_vcpu_halt(vcpu);
> >       }
> >       return 1;
> > +
> > +vmentry_failed:
> > +     vmx->nested.nested_run_pending = 0;
> > +     if (status == NVMX_VMENTRY_KVM_INTERNAL_ERROR)
> > +             return 0;
> > +     if (status == NVMX_VMENTRY_VMEXIT)
> > +             return 1;
> > +     WARN_ON_ONCE(status != NVMX_VMENTRY_VMFAIL);
> > +     return nested_vmx_failValid(vcpu,
> > +                                 VMXERR_ENTRY_INVALID_CONTROL_FIELD);
>
> This can fit on a single line.
>
> >  }
> >
> >  /*