Re: [PATCH v4] KVM: nVMX: Don't leak L1 MMIO regions to L2

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Mon, 14 Oct 2019 18:07:40 -0700

On Mon, Oct 14, 2019 at 05:13:04PM -0700, Jim Mattson wrote:
> If the "virtualize APIC accesses" VM-execution control is set in the
> VMCS, the APIC virtualization hardware is triggered when a page walk
> in VMX non-root mode terminates at a PTE wherein the address of the 4k
> page frame matches the APIC-access address specified in the VMCS. On
> hardware, the APIC-access address may be any valid 4k-aligned physical
> address.
> 
> KVM's nVMX implementation enforces the additional constraint that the
> APIC-access address specified in the vmcs12 must be backed by
> a "struct page" in L1. If not, L0 will simply clear the "virtualize
> APIC accesses" VM-execution control in the vmcs02.
> 
> The problem with this approach is that the L1 guest has arranged the
> vmcs12 EPT tables--or shadow page tables, if the "enable EPT"
> VM-execution control is clear in the vmcs12--so that the L2 guest
> physical address(es)--or L2 guest linear address(es)--that reference
> the L2 APIC map to the APIC-access address specified in the
> vmcs12. Without the "virtualize APIC accesses" VM-execution control in
> the vmcs02, the APIC accesses in the L2 guest will directly access the
> APIC-access page in L1.
> 
> When there is no mapping whatsoever for the APIC-access address in L1,
> the L2 VM just loses the intended APIC virtualization. However, when
> the APIC-access address is mapped to an MMIO region in L1, the L2
> guest gets direct access to the L1 MMIO device. For example, if the
> APIC-access address specified in the vmcs12 is 0xfee00000, then L2
> gets direct access to L1's APIC.
> 
> Since this vmcs12 configuration is something that KVM cannot
> faithfully emulate, the appropriate response is to exit to userspace
> with KVM_INTERNAL_ERROR_EMULATION.
> 
> Fixes: fe3ef05c7572 ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12")
> Reported-by: Dan Cross <dcross@xxxxxxxxxx>
> Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx>
> Reviewed-by: Peter Shier <pshier@xxxxxxxxxx>
> ---

With two nits below:

Reviewed-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>

> @@ -3244,13 +3247,9 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
>  	 * the nested entry.
>  	 */
>  	vmx->nested.nested_run_pending = 1;
> -	ret = nested_vmx_enter_non_root_mode(vcpu, true);
> -	vmx->nested.nested_run_pending = !ret;
> -	if (ret > 0)
> -		return 1;
> -	else if (ret)
> -		return nested_vmx_failValid(vcpu,
> -			VMXERR_ENTRY_INVALID_CONTROL_FIELD);
> +	status = nested_vmx_enter_non_root_mode(vcpu, true);
> +	if (unlikely(status != NVMX_VMENTRY_SUCCESS))

KVM doesn't usually add (un)likely annotations for things that are under
L1's control.  The "unlikely(vmx->fail)" in nested_vmx_exit_reflected() is
there because it's true iff KVM missed a VM-Fail condition that was caught
by hardware.

> +		goto vmentry_failed;
>  
>  	/* Hide L1D cache contents from the nested guest.  */
>  	vmx->vcpu.arch.l1tf_flush_l1d = true;
> @@ -3281,6 +3280,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
>  		return kvm_vcpu_halt(vcpu);
>  	}
>  	return 1;
> +
> +vmentry_failed:
> +	vmx->nested.nested_run_pending = 0;
> +	if (status == NVMX_VMENTRY_KVM_INTERNAL_ERROR)
> +		return 0;
> +	if (status == NVMX_VMENTRY_VMEXIT)
> +		return 1;
> +	WARN_ON_ONCE(status != NVMX_VMENTRY_VMFAIL);
> +	return nested_vmx_failValid(vcpu,
> +				    VMXERR_ENTRY_INVALID_CONTROL_FIELD);

This can fit on a single line.

>  }
>  
>  /*