On Mon, May 15, 2023, Kai Huang wrote: > On Fri, 2023-05-12 at 09:35 -0700, Sean Christopherson wrote: > > On Fri, May 12, 2023, Kai Huang wrote: > > > On Thu, 2023-05-11 at 16:33 -0700, Sean Christopherson wrote: > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > > index e7f78fe79b32..8b356c9d8a81 100644 > > > > --- a/arch/x86/kvm/x86.c > > > > +++ b/arch/x86/kvm/x86.c > > > > @@ -3700,8 +3700,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > > > return 1; > > > > } > > > > break; > > > > - case 0x200 ... MSR_IA32_MC0_CTL2 - 1: > > > > - case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff: > > > > + case MSR_IA32_CR_PAT: > > > > + case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000: > > > > + case MSR_MTRRdefType: > > > > return kvm_mtrr_set_msr(vcpu, msr, data); > > > > case MSR_IA32_APICBASE: > > > > return kvm_set_apic_base(vcpu, msr_info); > > > > @@ -4108,9 +4109,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > > > msr_info->data = kvm_scale_tsc(rdtsc(), ratio) + offset; > > > > break; > > > > } > > > > + case MSR_IA32_CR_PAT: > > > > case MSR_MTRRcap: > > > > > > ... Should we put MSR_IA32_CR_PAT after MSR_MTRRcap so it can be symmetric to > > > kvm_set_msr_common()? > > > > > > Looks there's no reason to put it before MSR_MTRRcap. > > > > No, it's above MTRRcap for two reasons: > > > > 1. When PAT is moved out of mtrr.c, PAT doesn't get bunded with the other MTRRs > > and so would just need to be hoisted back up. The end code looks like: > > > > case MSR_IA32_CR_PAT: > > msr_info->data = vcpu->arch.pat; > > break; > > case MSR_MTRRcap: > > case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000: > > case MSR_MTRRdefType: > > return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data); > > Sorry I mistakenly thought MSR_MTRRcap wasn't handled in kvm_mtrr_get_msr(). > Yes looks good to me. > > > > > 2. There is no MSR_MTRRcap case statement in kvm_set_msr_common() because it's > > a read-only MSR, i.e. the two can't be symmetric :-) > > Do you know why it is a read-only MSR, which enables both FIXED ranges and > (fixed number of) dynamic ranges? MTTRcap doesn't "enable" anything, it's a capabilities MSR (MTRR Capability is its given name in the SDM), similar to ARCH_CAPABILITIES, PERF_CAPABILITIES, etc. They're all essentially CPUID leafs, but presumably are MSRs due to being relevant only to CPL0. Or maybe some higher ups at Intel just spin a wheel to determine whether to use a CPUID leaf or an MSR. :-) > I am asking because there's a x86 series to fake a simple synthetic MTRR which > neither has fixed nor dynamic ranges but only has a default MSR_MTRRdefType: > > https://lore.kernel.org/lkml/20230509233641.GGZFrZCTDH7VwUMp5R@fat_crate.local/T/ > > The main use cases are Xen PV guests and SEV-SNP guests running under > Hyper-V, but it appears TDX guest is also desired to have similar handling, > because:� > > 1) TDX module exposes MTRR in CPUID to guest, but handles nothing about MTRR > MSRs but only injects #VE. > > 2) TDX module always maps guest private memory as WB (and ignores guest's PAT > IIUC); > > 3) For shared memory, w/o non-coherent DMA guest's MTRRs are ignored by KVM > anyway. TDX doesn't officially support non-trusted device assignment AFAICT. > Even we want to consider non-coherent DMA, it would only add confusion to honor > guest's MTRRs since they can point to private memory, which is always mapped as > WB. Yeah, I think the best option is for KVM to disallow attaching non-coherent DMA to TDX VMs. AFAIK, there's no use case for such a setup. > So basically looks there's no value to exposing FIXED and dynamic MTRR ranges to > TDX guest. Agreed.