On Tue, Jan 02, 2024, Jim Mattson wrote: > On Tue, Jan 2, 2024 at 3:24 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > On Thu, Dec 21, 2023, Xu Yilun wrote: > > > On Wed, Dec 20, 2023 at 08:28:06AM -0800, Sean Christopherson wrote: > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > > > index c57e181bba21..72634d6b61b2 100644 > > > > > --- a/arch/x86/kvm/mmu/mmu.c > > > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > > > @@ -5177,6 +5177,13 @@ void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu, > > > > > reset_guest_paging_metadata(vcpu, mmu); > > > > > } > > > > > > > > > > +/* guest-physical-address bits limited by TDP */ > > > > > +unsigned int kvm_mmu_tdp_maxphyaddr(void) > > > > > +{ > > > > > + return max_tdp_level == 5 ? 57 : 48; > > > > > > > > Using "57" is kinda sorta wrong, e.g. the SDM says: > > > > > > > > Bits 56:52 of each guest-physical address are necessarily zero because > > > > guest-physical addresses are architecturally limited to 52 bits. > > > > > > > > Rather than split hairs over something that doesn't matter, I think it makes sense > > > > for the CPUID code to consume max_tdp_level directly (I forgot that max_tdp_level > > > > is still accurate when tdp_root_level is non-zero). > > > > > > It is still accurate for now. Only AMD SVM sets tdp_root_level the same as > > > max_tdp_level: > > > > > > kvm_configure_mmu(npt_enabled, get_npt_level(), > > > get_npt_level(), PG_LEVEL_1G); > > > > > > But I wanna doulbe confirm if directly using max_tdp_level is fully > > > considered. In your last proposal, it is: > > > > > > u8 kvm_mmu_get_max_tdp_level(void) > > > { > > > return tdp_root_level ? tdp_root_level : max_tdp_level; > > > } > > > > > > and I think it makes more sense, because EPT setup follows the same > > > rule. If any future architechture sets tdp_root_level smaller than > > > max_tdp_level, the issue will happen again. > > > > Setting tdp_root_level != max_tdp_level would be a blatant bug. max_tdp_level > > really means "max possible TDP level KVM can use". If an exact TDP level is being > > forced by tdp_root_level, then by definition it's also the max TDP level, because > > it's the _only_ TDP level KVM supports. > > This is all just so broken and wrong. The only guest.MAXPHYADDR that > can be supported under TDP is the host.MAXPHYADDR. If KVM claims to > support a smaller guest.MAXPHYADDR, then KVM is obligated to intercept > every #PF, and to emulate the faulting instruction to see if the RSVD > bit should be set in the error code. Hardware isn't going to do it. > Since some page faults may occur in CPL3, this means that KVM has to > be prepared to emulate any memory-accessing instruction. That's not > practical. > > Basically, a CPU with more than 48 bits of physical address that > doesn't support 5-level EPT really doesn't support EPT at all, except > perhaps in the context of some new paravirtual pinky-swear from the > guest that it doesn't care about the RSVD bit in #PF error codes. Doh, I managed to forget about the RSVD #PF mess. That said, this patch will "work" if userspace enables allow_smaller_maxphyaddr. In quotes because I'm still skeptical that allow_smaller_maxphyaddr actually works in all scenarios. And we'd need a way to communicate all of that to userspace. Blech.