On Fri, 2023-03-24 at 18:12 -0700, Isaku Yamahata wrote: > On Thu, Mar 16, 2023 at 10:38:02AM +0000, > "Huang, Kai" <kai.huang@xxxxxxxxx> wrote: > > > On Sun, 2023-03-12 at 10:56 -0700, isaku.yamahata@xxxxxxxxx wrote: > > > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > > > > > Because TDX virtualize cpuid[0x1].EDX[MTRR: bit 12] to fixed 1, guest TD > > > thinks MTRR is supported. Although TDX supports only WB for private GPA, > > > it's desirable to support MTRR for shared GPA. As guest access to MTRR > > > MSRs causes #VE and KVM/x86 tracks the values of MTRR MSRs, the remining > > > part is to implement get_mt_mask method for TDX for shared GPA. > > > > > > Pass around shared bit from kvm fault handler to get_mt_mask method so that > > > it can determine if the gfn is shared or private. > > > > > > > I think we have an Xarray to query whether a given GFN is shared or private? > > Can we use that? > > > > > Implement get_mt_mask() > > > following vmx case for shared GPA and return WB for private GPA. > > > the existing vmx_get_mt_mask() can't be directly used as CPU state(CR0.CD) > > > is protected. GFN passed to kvm_mtrr_check_gfn_range_consistency() should > > > include shared bit. > > > > > > Suggested-by: Kai Huang <kai.huang@xxxxxxxxx> > > > > I am not sure what is suggested by me? > > > > I thought what I suggested is we should have a dedicated patch to handle MTRR > > for TDX putting all related things together. > > Sure. After looking at the specs, my conclusion is that guest TD can't update > MTRR reliably. Because MTRR update requires to disable cache(CR0.CR=1), cache > flush, and tlb flush. (SDM 3a 12.11.7: MTRR maintenance programming interface) > Linux implements MTRR update logic according to it. > > While TDX enforces CR0.CD=0, trying to set CR0.CD=1 results in #GP. At the > same time, it reports that MTRR is available via cpuid. So I come up with the > followings. > > - MTRRCap(RO): report no feature available > SMRR=0: SMRR interface unsupported > WC=0: write combining unsupported > FIX=0: Fixed range registers unsupported > VCNT=0: number of variable range regitsers = 0 > > - MTRRDefType(R/W): Only writeback even with reset state. > E=1: enable MTRR (E=0 means all memory is UC.) > FE=0: disable fixed range MTRRs > type: default memory type=writeback > Accept write this value. Other value results in #GP. > > - Fixed range MTRR > #GP as fixed range MTRRs is reported as unsupported > > - Variable range MTRRs > #GP as the number of variable range MTRRs is reported as zero Looks sane to me. > > > > > +u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) > > > +{ > > > + /* TDX private GPA is always WB. */ > > > + if (!(gfn & kvm_gfn_shared_mask(vcpu->kvm))) { > > > + /* MMIO is only for shared GPA. */ > > > + WARN_ON_ONCE(is_mmio); > > > + return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT; > > > + } > > > + > > > + /* Drop shared bit as MTRR doesn't know about shared bit. */ > > > + gfn = kvm_gfn_to_private(vcpu->kvm, gfn); > > > + > > > + /* As TDX enforces CR0.CD to 0, pass check_cr0_cd = false. */ > > > + return __vmx_get_mt_mask(vcpu, gfn, is_mmio, false); > > > +} > > > > > > Do you know whether there's any use case of non-coherent device assignment to > > TDX guest? > > > > IMHO, we should just disallow TDX guest to support non-coherent device > > assignment, so that we can just return WB for both private and shared. > > > > If we support non-coherent device assignment, then if guest sets private memory > > to non-WB memory, it believes the memory type is non-WB, but in fact TDX always > > map private memory as WB. > > > > Will this be a problem, i.e. if assigned device can DMA to private memory > > directly in the future? > > MTRR is legacy feature and PAT replaced it. We can rely on guest to use PAT. > Here is the new patch for MTRR. > > --- a/arch/x86/kvm/vmx/main.c > +++ b/arch/x86/kvm/vmx/main.c > @@ -229,6 +229,14 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, > vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level); > } > > +static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) > +{ > + if (is_td_vcpu(vcpu)) > + return tdx_get_mt_mask(vcpu, gfn, is_mmio); > + > + return vmx_get_mt_mask(vcpu, gfn, is_mmio); > +} > + > static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) > { > if (!is_td(kvm)) > @@ -349,7 +357,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { > > .set_tss_addr = vmx_set_tss_addr, > .set_identity_map_addr = vmx_set_identity_map_addr, > - .get_mt_mask = vmx_get_mt_mask, > + .get_mt_mask = vt_get_mt_mask, > > .get_exit_info = vmx_get_exit_info, > > diff -u b/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > --- b/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -347,6 +347,25 @@ > return 0; > } > > +u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) > +{ > + /* TDX private GPA is always WB. */ > + if (!(gfn & kvm_gfn_shared_mask(vcpu->kvm))) { Are you still passing a "raw" GFN? Could you explain why you choose this way? > + /* MMIO is only for shared GPA. */ > + WARN_ON_ONCE(is_mmio); > + return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT; I guess it's better to include VMX_EPT_IPAT_BIT bit. > + } > + > + if (is_mmio) > + return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT; > + > + if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) > + return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; > + > + /* TDX enforces CR0.CD = 0 and KVM MTRR emulation enforces writeback. */ > + return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT; > +} > +