On Mon, Oct 14, 2019 at 05:13:04PM -0700, Jim Mattson wrote: > If the "virtualize APIC accesses" VM-execution control is set in the > VMCS, the APIC virtualization hardware is triggered when a page walk > in VMX non-root mode terminates at a PTE wherein the address of the 4k > page frame matches the APIC-access address specified in the VMCS. On > hardware, the APIC-access address may be any valid 4k-aligned physical > address. > > KVM's nVMX implementation enforces the additional constraint that the > APIC-access address specified in the vmcs12 must be backed by > a "struct page" in L1. If not, L0 will simply clear the "virtualize > APIC accesses" VM-execution control in the vmcs02. > > The problem with this approach is that the L1 guest has arranged the > vmcs12 EPT tables--or shadow page tables, if the "enable EPT" > VM-execution control is clear in the vmcs12--so that the L2 guest > physical address(es)--or L2 guest linear address(es)--that reference > the L2 APIC map to the APIC-access address specified in the > vmcs12. Without the "virtualize APIC accesses" VM-execution control in > the vmcs02, the APIC accesses in the L2 guest will directly access the > APIC-access page in L1. > > When there is no mapping whatsoever for the APIC-access address in L1, > the L2 VM just loses the intended APIC virtualization. However, when > the APIC-access address is mapped to an MMIO region in L1, the L2 > guest gets direct access to the L1 MMIO device. For example, if the > APIC-access address specified in the vmcs12 is 0xfee00000, then L2 > gets direct access to L1's APIC. > > Since this vmcs12 configuration is something that KVM cannot > faithfully emulate, the appropriate response is to exit to userspace > with KVM_INTERNAL_ERROR_EMULATION. > > Fixes: fe3ef05c7572 ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12") > Reported-by: Dan Cross <dcross@xxxxxxxxxx> > Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx> > Reviewed-by: Peter Shier <pshier@xxxxxxxxxx> > --- With two nits below: Reviewed-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > @@ -3244,13 +3247,9 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) > * the nested entry. > */ > vmx->nested.nested_run_pending = 1; > - ret = nested_vmx_enter_non_root_mode(vcpu, true); > - vmx->nested.nested_run_pending = !ret; > - if (ret > 0) > - return 1; > - else if (ret) > - return nested_vmx_failValid(vcpu, > - VMXERR_ENTRY_INVALID_CONTROL_FIELD); > + status = nested_vmx_enter_non_root_mode(vcpu, true); > + if (unlikely(status != NVMX_VMENTRY_SUCCESS)) KVM doesn't usually add (un)likely annotations for things that are under L1's control. The "unlikely(vmx->fail)" in nested_vmx_exit_reflected() is there because it's true iff KVM missed a VM-Fail condition that was caught by hardware. > + goto vmentry_failed; > > /* Hide L1D cache contents from the nested guest. */ > vmx->vcpu.arch.l1tf_flush_l1d = true; > @@ -3281,6 +3280,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) > return kvm_vcpu_halt(vcpu); > } > return 1; > + > +vmentry_failed: > + vmx->nested.nested_run_pending = 0; > + if (status == NVMX_VMENTRY_KVM_INTERNAL_ERROR) > + return 0; > + if (status == NVMX_VMENTRY_VMEXIT) > + return 1; > + WARN_ON_ONCE(status != NVMX_VMENTRY_VMFAIL); > + return nested_vmx_failValid(vcpu, > + VMXERR_ENTRY_INVALID_CONTROL_FIELD); This can fit on a single line. > } > > /*