Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

Yan Zhao <yan.y.zhao@xxxxxxxxx> · Mon, 11 Mar 2024 09:16:41 +0800

On Fri, Mar 08, 2024 at 05:09:29PM -0800, Sean Christopherson wrote:
> Unconditionally honor guest PAT on CPUs that support self-snoop, as
> Intel has confirmed that CPUs that support self-snoop always snoop caches
> and store buffers.  I.e. CPUs with self-snoop maintain cache coherency
> even in the presence of aliased memtypes, thus there is no need to trust
> the guest behaves and only honor PAT as a last resort, as KVM does today.
> 
> Honoring guest PAT is desirable for use cases where the guest has access
> to non-coherent DMA _without_ bouncing through VFIO, e.g. when a virtual
> (mediated, for all intents and purposes) GPU is exposed to the guest, along
> with buffers that are consumed directly by the physical GPU, i.e. which
> can't be proxied by the host to ensure writes from the guest are performed
> with the correct memory type for the GPU.
> 
> Cc: Yiwei Zhang <zzyiwei@xxxxxxxxxx>
> Suggested-by: Yan Zhao <yan.y.zhao@xxxxxxxxx>
> Suggested-by: Kevin Tian <kevin.tian@xxxxxxxxx>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
>  arch/x86/kvm/mmu/mmu.c |  8 +++++---
>  arch/x86/kvm/vmx/vmx.c | 10 ++++++----
>  2 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 403cd8f914cd..7fa514830628 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4622,14 +4622,16 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
>  bool kvm_mmu_may_ignore_guest_pat(void)
>  {
>  	/*
> -	 * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM
> +	 * When EPT is enabled (shadow_memtype_mask is non-zero), the CPU does
> +	 * not support self-snoop (or is affected by an erratum), and the VM
>  	 * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to
>  	 * honor the memtype from the guest's PAT so that guest accesses to
>  	 * memory that is DMA'd aren't cached against the guest's wishes.  As a
>  	 * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA,
> -	 * KVM _always_ ignores guest PAT (when EPT is enabled).
> +	 * KVM _always_ ignores or honors guest PAT, i.e. doesn't toggle SPTE
> +	 * bits in response to non-coherent device (un)registration.
>  	 */
> -	return shadow_memtype_mask;
> +	return !static_cpu_has(X86_FEATURE_SELFSNOOP) && shadow_memtype_mask;
>  }
>  
>  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 17a8e4fdf9c4..5dc4c24ae203 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7605,11 +7605,13 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
>  
>  	/*
>  	 * Force WB and ignore guest PAT if the VM does NOT have a non-coherent
> -	 * device attached.  Letting the guest control memory types on Intel
> -	 * CPUs may result in unexpected behavior, and so KVM's ABI is to trust
> -	 * the guest to behave only as a last resort.
> +	 * device attached and the CPU doesn't support self-snoop.  Letting the
> +	 * guest control memory types on Intel CPUs without self-snoop may
> +	 * result in unexpected behavior, and so KVM's (historical) ABI is to
> +	 * trust the guest to behave only as a last resort.
>  	 */
> -	if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
> +	if (!static_cpu_has(X86_FEATURE_SELFSNOOP) &&
> +	    !kvm_arch_has_noncoherent_dma(vcpu->kvm))
>  		return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;

For the case of !static_cpu_has(X86_FEATURE_SELFSNOOP) &&
kvm_arch_has_noncoherent_dma(vcpu->kvm), I think we at least should warn
about unsafe before honoring guest memory type.
Though it's a KVM's historical ABI, it's not safe in the security perspective
because page aliasing without proper cache flush handling on CPUs without
self-snoop may open a door for guest to read uninitialized host data.
e.g. when there's a noncoherent DMA device attached, and if there's a memory
region that is not pinned in vfio/iommufd side, (e.g. memory region in vfio's
skipped section), then though the guest memory from this memory region is not
accessible to noncoherent DMAs, vCPUs can still access this part of guest memory.
Then if vCPUs use WC as guest type, it may bypass host's initialization data in
cache and read stale data in host, causing information leak.

My preference is still to force WB
(i.e. (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT) in case of
!static_cpu_has(X86_FEATURE_SELFSNOOP) && kvm_arch_has_noncoherent_dma(vcpu->kvm).
Firstly, it's because there're few CPUs with features VMX without self-snoop;
Secondly, security takes priority over functionality :)

>  
>  	return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT);
> -- 
> 2.44.0.278.ge034bb2e1d-goog
>