KVM: x86: On Tue, Mar 05, 2024, Gerd Hoffmann wrote: > Set CPUID.0x80000008:EAX[23:16] to guest phys bits, i.e. the bits which > are actually addressable. In most cases this is identical to the host > phys bits, but tdp restrictions (no 5-level paging) can limit this to > 48. > > Quoting AMD APM (revision 3.35): > > 23:16 GuestPhysAddrSize Maximum guest physical address size in bits. > This number applies only to guests using nested > paging. When this field is zero, refer to the > PhysAddrSize field for the maximum guest > physical address size. See “Secure Virtual > Machine” in APM Volume 2. > > Tom Lendacky confirmed the purpose of this field is software use, > hardware always returns zero here. > > Signed-off-by: Gerd Hoffmann <kraxel@xxxxxxxxxx> > --- > arch/x86/kvm/mmu.h | 2 ++ > arch/x86/kvm/cpuid.c | 3 ++- > arch/x86/kvm/mmu/mmu.c | 15 +++++++++++++++ > 3 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index 60f21bb4c27b..42b5212561c8 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -100,6 +100,8 @@ static inline u8 kvm_get_shadow_phys_bits(void) > return boot_cpu_data.x86_phys_bits; > } > > +int kvm_mmu_get_guest_phys_bits(void); > + > void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask); > void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask); > void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only); > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index adba49afb5fe..12037f1b017e 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -1240,7 +1240,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) > else if (!g_phys_as) Based on the new information that GuestPhysAddrSize is software-defined, and the fact that KVM and QEMU are planning on using GuestPhysAddrSize to communicate the maximum *addressable* GPA, deriving PhysAddrSize from GuestPhysAddrSize is wrong. E.g. if KVM is running as L1 on top of a new KVM, on a CPU with MAXPHYADDR=52, and on a CPU without 5-level TDP, then KVM (as L1) will see: PhysAddrSize = 52 GuestPhysAddrSize = 48 Propagating GuestPhysAddrSize to PhysAddrSize (which is confusingly g_phys_as) will yield an L2 with PhysAddrSize = 48 GuestPhysAddrSize = 48 which is broken, because GPAs with bits 51:48!=0 are *legal*, but not addressable. > g_phys_as = phys_as; > > - entry->eax = g_phys_as | (virt_as << 8); > + entry->eax = g_phys_as | (virt_as << 8) > + | kvm_mmu_get_guest_phys_bits() << 16; The APM explicitly states that GuestPhysAddrSize only applies to NPT. KVM should follow suit to avoid creating unnecessary ABI, and because KVM can address any legal GPA when using shadow paging. > entry->ecx &= ~(GENMASK(31, 16) | GENMASK(11, 8)); > entry->edx = 0; > cpuid_entry_override(entry, CPUID_8000_0008_EBX); > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 2d6cdeab1f8a..8bebb3e96c8a 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -5267,6 +5267,21 @@ static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu) > return max_tdp_level; > } > > +/* > + * return the actually addressable guest phys bits, which might be > + * less than host phys bits due to tdp restrictions. > + */ > +int kvm_mmu_get_guest_phys_bits(void) > +{ > + if (tdp_enabled && shadow_phys_bits > 48) { > + if (tdp_root_level && tdp_root_level != PT64_ROOT_5LEVEL) > + return 48; > + if (max_tdp_level != PT64_ROOT_5LEVEL) > + return 48; I would prefer to not use shadow_phys_bits to cap the reported CPUID.0x8000_0008, so that the logic isn't spread across the CPUID code and the MMU. I don't love that the two have duplicate logic, but there's no great way to handle that since the MMU needs to be able to determine the effective host MAXPHYADDR even if CPUID.0x8000_0008 is unsupported. I'm thinking this, maybe spread across two patches: one to undo KVM's usage of GuestPhysAddrSize, and a second to then set GuestPhysAddrSize for userspace? --- arch/x86/kvm/cpuid.c | 38 ++++++++++++++++++++++++++++---------- arch/x86/kvm/mmu.h | 2 ++ arch/x86/kvm/mmu/mmu.c | 5 +++++ 3 files changed, 35 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index adba49afb5fe..ae03e69d7fb9 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1221,9 +1221,18 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) entry->eax = entry->ebx = entry->ecx = 0; break; case 0x80000008: { - unsigned g_phys_as = (entry->eax >> 16) & 0xff; - unsigned virt_as = max((entry->eax >> 8) & 0xff, 48U); - unsigned phys_as = entry->eax & 0xff; + unsigned int virt_as = max((entry->eax >> 8) & 0xff, 48U); + + /* + * KVM's ABI is to report the effective MAXPHYADDR for the guest + * in PhysAddrSize (phys_as), and the maximum *addressable* GPA + * in GuestPhysAddrSize (g_phys_as). GuestPhysAddrSize is valid + * if and only if TDP is enabled, in which case the max GPA that + * can be addressed by KVM may be less than the max GPA that can + * be legally generated by the guest, e.g. if MAXPHYADDR>48 but + * the CPU doesn't support 5-level TDP. + */ + unsigned int phys_as, g_phys_as; /* * If TDP (NPT) is disabled use the adjusted host MAXPHYADDR as @@ -1231,16 +1240,25 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) * reductions in MAXPHYADDR for memory encryption affect shadow * paging, too. * - * If TDP is enabled but an explicit guest MAXPHYADDR is not - * provided, use the raw bare metal MAXPHYADDR as reductions to - * the HPAs do not affect GPAs. + * If TDP is enabled, the effective guest MAXPHYADDR is the same + * as the raw bare metal MAXPHYADDR, as reductions to HPAs don't + * affect GPAs. The max addressable GPA is the same as the max + * effective GPA, except that it's capped at 48 bits if 5-level + * TDP isn't supported (hardware processes bits 51:48 only when + * walking the fifth level page table). */ - if (!tdp_enabled) - g_phys_as = boot_cpu_data.x86_phys_bits; - else if (!g_phys_as) + if (!tdp_enabled) { + phys_as = boot_cpu_data.x86_phys_bits; + g_phys_as = 0; + } else { + phys_as = entry->eax & 0xff; g_phys_as = phys_as; - entry->eax = g_phys_as | (virt_as << 8); + if (kvm_mmu_get_max_tdp_level() < 5) + g_phys_as = min(g_phys_as, 48); + } + + entry->eax = phys_as | (virt_as << 8) | (g_phys_as << 16); entry->ecx &= ~(GENMASK(31, 16) | GENMASK(11, 8)); entry->edx = 0; cpuid_entry_override(entry, CPUID_8000_0008_EBX); diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 60f21bb4c27b..b410a227c601 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -100,6 +100,8 @@ static inline u8 kvm_get_shadow_phys_bits(void) return boot_cpu_data.x86_phys_bits; } +u8 kvm_mmu_get_max_tdp_level(void); + void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask); void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask); void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 2d6cdeab1f8a..ffd32400fd8c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5267,6 +5267,11 @@ static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu) return max_tdp_level; } +u8 kvm_mmu_get_max_tdp_level(void) +{ + return tdp_root_level ? tdp_root_level : max_tdp_level; +} + static union kvm_mmu_page_role kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, union kvm_cpu_role cpu_role) base-commit: c0372e747726ce18a5fba8cdc71891bd795148f6 --