On Wed, Jan 03, 2024 at 10:04:41AM -0800, Sean Christopherson wrote: >On Tue, Jan 02, 2024, Jim Mattson wrote: >> On Tue, Jan 2, 2024 at 3:24 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: >> > >> > On Thu, Dec 21, 2023, Xu Yilun wrote: >> > > On Wed, Dec 20, 2023 at 08:28:06AM -0800, Sean Christopherson wrote: >> > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c >> > > > > index c57e181bba21..72634d6b61b2 100644 >> > > > > --- a/arch/x86/kvm/mmu/mmu.c >> > > > > +++ b/arch/x86/kvm/mmu/mmu.c >> > > > > @@ -5177,6 +5177,13 @@ void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu, >> > > > > reset_guest_paging_metadata(vcpu, mmu); >> > > > > } >> > > > > >> > > > > +/* guest-physical-address bits limited by TDP */ >> > > > > +unsigned int kvm_mmu_tdp_maxphyaddr(void) >> > > > > +{ >> > > > > + return max_tdp_level == 5 ? 57 : 48; >> > > > >> > > > Using "57" is kinda sorta wrong, e.g. the SDM says: >> > > > >> > > > Bits 56:52 of each guest-physical address are necessarily zero because >> > > > guest-physical addresses are architecturally limited to 52 bits. >> > > > >> > > > Rather than split hairs over something that doesn't matter, I think it makes sense >> > > > for the CPUID code to consume max_tdp_level directly (I forgot that max_tdp_level >> > > > is still accurate when tdp_root_level is non-zero). >> > > >> > > It is still accurate for now. Only AMD SVM sets tdp_root_level the same as >> > > max_tdp_level: >> > > >> > > kvm_configure_mmu(npt_enabled, get_npt_level(), >> > > get_npt_level(), PG_LEVEL_1G); >> > > >> > > But I wanna doulbe confirm if directly using max_tdp_level is fully >> > > considered. In your last proposal, it is: >> > > >> > > u8 kvm_mmu_get_max_tdp_level(void) >> > > { >> > > return tdp_root_level ? tdp_root_level : max_tdp_level; >> > > } >> > > >> > > and I think it makes more sense, because EPT setup follows the same >> > > rule. If any future architechture sets tdp_root_level smaller than >> > > max_tdp_level, the issue will happen again. >> > >> > Setting tdp_root_level != max_tdp_level would be a blatant bug. max_tdp_level >> > really means "max possible TDP level KVM can use". If an exact TDP level is being >> > forced by tdp_root_level, then by definition it's also the max TDP level, because >> > it's the _only_ TDP level KVM supports. >> >> This is all just so broken and wrong. The only guest.MAXPHYADDR that >> can be supported under TDP is the host.MAXPHYADDR. If KVM claims to >> support a smaller guest.MAXPHYADDR, then KVM is obligated to intercept >> every #PF, in this case (i.e., to support 48-bit guest.MAXPHYADDR when CPU supports only 4-level EPT), KVM has no need to intercept #PF because accessing a GPA with RSVD bits 51-48 set leads to EPT violation. >> and to emulate the faulting instruction to see if the RSVD >> bit should be set in the error code. Hardware isn't going to do it. Note for EPT violation VM exits, the CPU stores the GPA that caused this exit in "guest-physical address" field of VMCS. so, it is not necessary to emulate the faulting instruction to determine if any RSVD bit is set. >> Since some page faults may occur in CPL3, this means that KVM has to >> be prepared to emulate any memory-accessing instruction. That's not >> practical. as said above, no need to intercept #PF for this specific case.