On Thu, Jan 4, 2024 at 7:08 AM Chao Gao <chao.gao@xxxxxxxxx> wrote: > > On Wed, Jan 03, 2024 at 07:40:02PM -0800, Jim Mattson wrote: > >On Wed, Jan 3, 2024 at 6:45 PM Chao Gao <chao.gao@xxxxxxxxx> wrote: > >> > >> On Wed, Jan 03, 2024 at 10:04:41AM -0800, Sean Christopherson wrote: > >> >On Tue, Jan 02, 2024, Jim Mattson wrote: > >> >> On Tue, Jan 2, 2024 at 3:24 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > >> >> > > >> >> > On Thu, Dec 21, 2023, Xu Yilun wrote: > >> >> > > On Wed, Dec 20, 2023 at 08:28:06AM -0800, Sean Christopherson wrote: > >> >> > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > >> >> > > > > index c57e181bba21..72634d6b61b2 100644 > >> >> > > > > --- a/arch/x86/kvm/mmu/mmu.c > >> >> > > > > +++ b/arch/x86/kvm/mmu/mmu.c > >> >> > > > > @@ -5177,6 +5177,13 @@ void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu, > >> >> > > > > reset_guest_paging_metadata(vcpu, mmu); > >> >> > > > > } > >> >> > > > > > >> >> > > > > +/* guest-physical-address bits limited by TDP */ > >> >> > > > > +unsigned int kvm_mmu_tdp_maxphyaddr(void) > >> >> > > > > +{ > >> >> > > > > + return max_tdp_level == 5 ? 57 : 48; > >> >> > > > > >> >> > > > Using "57" is kinda sorta wrong, e.g. the SDM says: > >> >> > > > > >> >> > > > Bits 56:52 of each guest-physical address are necessarily zero because > >> >> > > > guest-physical addresses are architecturally limited to 52 bits. > >> >> > > > > >> >> > > > Rather than split hairs over something that doesn't matter, I think it makes sense > >> >> > > > for the CPUID code to consume max_tdp_level directly (I forgot that max_tdp_level > >> >> > > > is still accurate when tdp_root_level is non-zero). > >> >> > > > >> >> > > It is still accurate for now. Only AMD SVM sets tdp_root_level the same as > >> >> > > max_tdp_level: > >> >> > > > >> >> > > kvm_configure_mmu(npt_enabled, get_npt_level(), > >> >> > > get_npt_level(), PG_LEVEL_1G); > >> >> > > > >> >> > > But I wanna doulbe confirm if directly using max_tdp_level is fully > >> >> > > considered. In your last proposal, it is: > >> >> > > > >> >> > > u8 kvm_mmu_get_max_tdp_level(void) > >> >> > > { > >> >> > > return tdp_root_level ? tdp_root_level : max_tdp_level; > >> >> > > } > >> >> > > > >> >> > > and I think it makes more sense, because EPT setup follows the same > >> >> > > rule. If any future architechture sets tdp_root_level smaller than > >> >> > > max_tdp_level, the issue will happen again. > >> >> > > >> >> > Setting tdp_root_level != max_tdp_level would be a blatant bug. max_tdp_level > >> >> > really means "max possible TDP level KVM can use". If an exact TDP level is being > >> >> > forced by tdp_root_level, then by definition it's also the max TDP level, because > >> >> > it's the _only_ TDP level KVM supports. > >> >> > >> >> This is all just so broken and wrong. The only guest.MAXPHYADDR that > >> >> can be supported under TDP is the host.MAXPHYADDR. If KVM claims to > >> >> support a smaller guest.MAXPHYADDR, then KVM is obligated to intercept > >> >> every #PF, > >> > >> in this case (i.e., to support 48-bit guest.MAXPHYADDR when CPU supports only > >> 4-level EPT), KVM has no need to intercept #PF because accessing a GPA with > >> RSVD bits 51-48 set leads to EPT violation. > > > >At the completion of the page table walk, if there is a permission > >fault, the data address should not be accessed, so there should not be > >an EPT violation. Remember Meltdown? > > You are right. I missed this case. KVM needs to intercept #PF to set RSVD bit > in PFEC. I have no problem with a user deliberately choosing an unsupported configuration, but I do have a problem with KVM_GET_SUPPORTED_CPUID returning an unsupported configuration. guest MAXPHYADDR < host MAXPHYADDR has the following issues: 1. In PAE mode, MOV to CR3 will not raise #GP for guest-reserved bits in PDPTRs that are not host-reserved. 2. #PF for permission violations will not set the RSVD bit in the error code for guest-reserved bits in the final data address PFN that are not host-reserved. 3. #PF for other PFNs with guest-reserved bits that are not host-reserved may not accurately set the non-RSVD bits (e.g. U/S, R/W) in the error code. Fix these three issues, and I will happily withdraw my objection.