On Fri, Apr 29, 2022, Paolo Bonzini wrote: > On 4/29/22 01:34, Sean Christopherson wrote: > > > +static inline gfn_t kvm_mmu_max_gfn_host(void) > > +{ > > + /* > > + * Disallow SPTEs (via memslots or cached MMIO) whose gfn would exceed > > + * host.MAXPHYADDR. Assuming KVM is running on bare metal, guest > > + * accesses beyond host.MAXPHYADDR will hit a #PF(RSVD) and never hit > > + * an EPT Violation/Misconfig / #NPF, and so KVM will never install a > > + * SPTE for such addresses. That doesn't hold true if KVM is running > > + * as a VM itself, e.g. if the MAXPHYADDR KVM sees is less than > > + * hardware's real MAXPHYADDR, but since KVM can't honor such behavior > > + * on bare metal, disallow it entirely to simplify e.g. the TDP MMU. > > + */ > > + return (1ULL << (shadow_phys_bits - PAGE_SHIFT)) - 1; > > The host.MAXPHYADDR however does not matter if EPT/NPT is not in use, because > the shadow paging fault path can accept any gfn. ... > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index e6cae6f22683..dba275d323a7 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -65,6 +65,30 @@ static __always_inline u64 rsvd_bits(int s, int e) > return ((2ULL << (e - s)) - 1) << s; > } > +/* > + * The number of non-reserved physical address bits irrespective of features > + * that repurpose legal bits, e.g. MKTME. > + */ > +extern u8 __read_mostly shadow_phys_bits; > + > +static inline gfn_t kvm_mmu_max_gfn(void) > +{ > + /* > + * Note that this uses the host MAXPHYADDR, not the guest's. > + * EPT/NPT cannot support GPAs that would exceed host.MAXPHYADDR; > + * assuming KVM is running on bare metal, guest accesses beyond > + * host.MAXPHYADDR will hit a #PF(RSVD) and never cause a vmexit > + * (either EPT Violation/Misconfig or #NPF), and so KVM will never > + * install a SPTE for such addresses. If KVM is running as a VM > + * itself, on the other hand, it might see a MAXPHYADDR that is less > + * than hardware's real MAXPHYADDR. Using the host MAXPHYADDR > + * disallows such SPTEs entirely and simplifies the TDP MMU. > + */ > + int max_gpa_bits = likely(tdp_enabled) ? shadow_phys_bits : 52; I don't love the divergent memslot behavior, but it's technically correct, so I can't really argue. Do we want to "officially" document the memslot behavior? > + > + return (1ULL << (max_gpa_bits - PAGE_SHIFT)) - 1; > +} > + > void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask); > void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);