Re: [PATCH v13 057/113] KVM: TDX: MTRR: implement get_mt_mask() for TDX

"Huang, Kai" <kai.huang@xxxxxxxxx> · Mon, 27 Mar 2023 09:54:40 +0000

On Fri, 2023-03-24 at 18:12 -0700, Isaku Yamahata wrote:
> On Thu, Mar 16, 2023 at 10:38:02AM +0000,
> "Huang, Kai" <kai.huang@xxxxxxxxx> wrote:
> 
> > On Sun, 2023-03-12 at 10:56 -0700, isaku.yamahata@xxxxxxxxx wrote:
> > > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> > > 
> > > Because TDX virtualize cpuid[0x1].EDX[MTRR: bit 12] to fixed 1, guest TD
> > > thinks MTRR is supported.  Although TDX supports only WB for private GPA,
> > > it's desirable to support MTRR for shared GPA.  As guest access to MTRR
> > > MSRs causes #VE and KVM/x86 tracks the values of MTRR MSRs, the remining
> > > part is to implement get_mt_mask method for TDX for shared GPA.
> > > 
> > > Pass around shared bit from kvm fault handler to get_mt_mask method so that
> > > it can determine if the gfn is shared or private.  
> > > 
> > 
> > I think we have an Xarray to query whether a given GFN is shared or private?
> > Can we use that?
> > 
> > > Implement get_mt_mask()
> > > following vmx case for shared GPA and return WB for private GPA.
> > > the existing vmx_get_mt_mask() can't be directly used as CPU state(CR0.CD)
> > > is protected.  GFN passed to kvm_mtrr_check_gfn_range_consistency() should
> > > include shared bit.
> > > 
> > > Suggested-by: Kai Huang <kai.huang@xxxxxxxxx>
> > 
> > I am not sure what is suggested by me?
> > 
> > I thought what I suggested is we should have a dedicated patch to handle MTRR
> > for TDX putting all related things together.
> 
> Sure. After looking at the specs, my conclusion is that guest TD can't update
> MTRR reliably.  Because MTRR update requires to disable cache(CR0.CR=1), cache
> flush, and tlb flush. (SDM 3a 12.11.7: MTRR maintenance programming interface)
> Linux implements MTRR update logic according to it.
> 
> While TDX enforces CR0.CD=0, trying to set CR0.CD=1 results in #GP.  At the
> same time, it reports that MTRR is available via cpuid.  So I come up with the
> followings.
> 
> - MTRRCap(RO): report no feature available
>   SMRR=0: SMRR interface unsupported
>   WC=0: write combining unsupported
>   FIX=0: Fixed range registers unsupported
>   VCNT=0: number of variable range regitsers = 0
> 
> - MTRRDefType(R/W): Only writeback even with reset state.
>   E=1: enable MTRR (E=0 means all memory is UC.)
>   FE=0: disable fixed range MTRRs
>   type: default memory type=writeback
>   Accept write this value. Other value results in #GP.
> 
> - Fixed range MTRR
>   #GP as fixed range MTRRs is reported as unsupported
> 
> - Variable range MTRRs
>   #GP as the number of variable range MTRRs is reported as zero

Looks sane to me.

>  
> 
> > > +u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> > > +{
> > > +	/* TDX private GPA is always WB. */
> > > +	if (!(gfn & kvm_gfn_shared_mask(vcpu->kvm))) {
> > > +		/* MMIO is only for shared GPA. */
> > > +		WARN_ON_ONCE(is_mmio);
> > > +		return  MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
> > > +	}
> > > +
> > > +	/* Drop shared bit as MTRR doesn't know about shared bit. */
> > > +	gfn = kvm_gfn_to_private(vcpu->kvm, gfn);
> > > +
> > > +	/* As TDX enforces CR0.CD to 0, pass check_cr0_cd = false. */
> > > +	return __vmx_get_mt_mask(vcpu, gfn, is_mmio, false);
> > > +}
> > 
> > 
> > Do you know whether there's any use case of non-coherent device assignment to
> > TDX guest?
> > 
> > IMHO, we should just disallow TDX guest to support non-coherent device
> > assignment, so that we can just return WB for both private and shared.
> > 
> > If we support non-coherent device assignment, then if guest sets private memory
> > to non-WB memory, it believes the memory type is non-WB, but in fact TDX always
> > map private memory as WB.
> > 
> > Will this be a problem, i.e. if assigned device can DMA to private memory
> > directly in the future?
> 
> MTRR is legacy feature and PAT replaced it.  We can rely on guest to use PAT.
> Here is the new patch for MTRR.
> 
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -229,6 +229,14 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
>  	vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
>  }
>  
> +static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> +{
> +	if (is_td_vcpu(vcpu))
> +		return tdx_get_mt_mask(vcpu, gfn, is_mmio);
> +
> +	return vmx_get_mt_mask(vcpu, gfn, is_mmio);
> +}
> +
>  static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>  {
>  	if (!is_td(kvm))
> @@ -349,7 +357,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
>  
>  	.set_tss_addr = vmx_set_tss_addr,
>  	.set_identity_map_addr = vmx_set_identity_map_addr,
> -	.get_mt_mask = vmx_get_mt_mask,
> +	.get_mt_mask = vt_get_mt_mask,
>  
>  	.get_exit_info = vmx_get_exit_info,
>  
> diff -u b/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> --- b/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -347,6 +347,25 @@
>  	return 0;
>  }
>  
> +u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> +{
> +	/* TDX private GPA is always WB. */
> +	if (!(gfn & kvm_gfn_shared_mask(vcpu->kvm))) {

Are you still passing a "raw" GFN?  Could you explain why you choose this way?

> +		/* MMIO is only for shared GPA. */
> +		WARN_ON_ONCE(is_mmio);
> +		return  MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;

I guess it's better to include VMX_EPT_IPAT_BIT bit.

> +	}
> +
> +	if (is_mmio)
> +		return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
> +
> +	if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
> +		return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
> +
> +	/* TDX enforces CR0.CD = 0 and KVM MTRR emulation enforces writeback. */
> +	return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
> +}
> +