On Tue, Dec 19, 2023, Chao Gao wrote: > On Mon, Dec 18, 2023 at 07:40:11PM -0800, Jim Mattson wrote: > >Honestly, I think KVM should just disable EPT if the EPT tables can't > >support the CPU's physical address width. > > Yes, it is an option. > But I prefer to allow admin to override this (i.e., admin still can enable EPT > via module parameter) because those issues are not new and disabling EPT > doesn't prevent QEMU from launching guests w/ smaller MAXPHYADDR. > > >> Here nothing visible to selftests or QEMU indicates that guest.MAXPHYADDR = 52 > >> is invalid/incorrect. how can we say selftests are at fault and we should fix > >> them? > > > >In this case, the CPU is at fault, and you should complain to the CPU vendor. > > Yeah, I agree with you and will check with related team inside Intel. I agree that the CPU is being weird, but this is technically an architecturally legal configuration, and KVM has largely committed to supporting weird setups. At some point we have to draw a line when things get too ridiculous, but I don't think this particular oddity crosses into absurd territory. > My point was just this isn't a selftest issue because not all information is > disclosed to the tests. Ah, right, EPT capabilities are in MSRs that userspace can't read. > And I am afraid KVM as L1 VMM may run into this situation, i.e., only 4-level > EPT is supported but MAXPHYADDR is 52. So, KVM needs a fix anyway. Yes, but forcing emulation for a funky setup is not a good fix. KVM can simply constrain the advertised MAXPHYADDR, no? --- arch/x86/kvm/cpuid.c | 17 +++++++++++++---- arch/x86/kvm/mmu.h | 2 ++ arch/x86/kvm/mmu/mmu.c | 5 +++++ 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 294e5bd5f8a0..5c346e1a10bd 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1233,12 +1233,21 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) * * If TDP is enabled but an explicit guest MAXPHYADDR is not * provided, use the raw bare metal MAXPHYADDR as reductions to - * the HPAs do not affect GPAs. + * the HPAs do not affect GPAs. Finally, if TDP is enabled and + * doesn't support 5-level paging, cap guest MAXPHYADDR at 48 + * bits as KVM can't install SPTEs for larger GPAs. */ - if (!tdp_enabled) + if (!tdp_enabled) { g_phys_as = boot_cpu_data.x86_phys_bits; - else if (!g_phys_as) - g_phys_as = phys_as; + } else { + u8 max_tdp_level = kvm_mmu_get_max_tdp_level(); + + if (!g_phys_as) + g_phys_as = phys_as; + + if (max_tdp_level < 5) + g_phys_as = min(g_phys_as, 48); + } entry->eax = g_phys_as | (virt_as << 8); entry->ecx &= ~(GENMASK(31, 16) | GENMASK(11, 8)); diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 60f21bb4c27b..b410a227c601 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -100,6 +100,8 @@ static inline u8 kvm_get_shadow_phys_bits(void) return boot_cpu_data.x86_phys_bits; } +u8 kvm_mmu_get_max_tdp_level(void); + void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask); void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask); void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3c844e428684..b2845f5520b3 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5267,6 +5267,11 @@ static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu) return max_tdp_level; } +u8 kvm_mmu_get_max_tdp_level(void) +{ + return tdp_root_level ? tdp_root_level : max_tdp_level; +} + static union kvm_mmu_page_role kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, union kvm_cpu_role cpu_role) base-commit: f2a3fb7234e52f72ff4a38364dbf639cf4c7d6c6 --