Re: [PATCH] KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 29 Apr 2022 14:24:35 +0000

On Fri, Apr 29, 2022, Paolo Bonzini wrote:
> On 4/29/22 01:34, Sean Christopherson wrote:
> 
> > +static inline gfn_t kvm_mmu_max_gfn_host(void)
> > +{
> > +	/*
> > +	 * Disallow SPTEs (via memslots or cached MMIO) whose gfn would exceed
> > +	 * host.MAXPHYADDR.  Assuming KVM is running on bare metal, guest
> > +	 * accesses beyond host.MAXPHYADDR will hit a #PF(RSVD) and never hit
> > +	 * an EPT Violation/Misconfig / #NPF, and so KVM will never install a
> > +	 * SPTE for such addresses.  That doesn't hold true if KVM is running
> > +	 * as a VM itself, e.g. if the MAXPHYADDR KVM sees is less than
> > +	 * hardware's real MAXPHYADDR, but since KVM can't honor such behavior
> > +	 * on bare metal, disallow it entirely to simplify e.g. the TDP MMU.
> > +	 */
> > +	return (1ULL << (shadow_phys_bits - PAGE_SHIFT)) - 1;
> 
> The host.MAXPHYADDR however does not matter if EPT/NPT is not in use, because
> the shadow paging fault path can accept any gfn.

... 

> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index e6cae6f22683..dba275d323a7 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -65,6 +65,30 @@ static __always_inline u64 rsvd_bits(int s, int e)
>  	return ((2ULL << (e - s)) - 1) << s;
>  }
> +/*
> + * The number of non-reserved physical address bits irrespective of features
> + * that repurpose legal bits, e.g. MKTME.
> + */
> +extern u8 __read_mostly shadow_phys_bits;
> +
> +static inline gfn_t kvm_mmu_max_gfn(void)
> +{
> +	/*
> +	 * Note that this uses the host MAXPHYADDR, not the guest's.
> +	 * EPT/NPT cannot support GPAs that would exceed host.MAXPHYADDR;
> +	 * assuming KVM is running on bare metal, guest accesses beyond
> +	 * host.MAXPHYADDR will hit a #PF(RSVD) and never cause a vmexit
> +	 * (either EPT Violation/Misconfig or #NPF), and so KVM will never
> +	 * install a SPTE for such addresses.  If KVM is running as a VM
> +	 * itself, on the other hand, it might see a MAXPHYADDR that is less
> +	 * than hardware's real MAXPHYADDR.  Using the host MAXPHYADDR
> +	 * disallows such SPTEs entirely and simplifies the TDP MMU.
> +	 */
> +	int max_gpa_bits = likely(tdp_enabled) ? shadow_phys_bits : 52;

I don't love the divergent memslot behavior, but it's technically correct, so I
can't really argue.  Do we want to "officially" document the memslot behavior?

> +
> +	return (1ULL << (max_gpa_bits - PAGE_SHIFT)) - 1;
> +}
> +
>  void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
>  void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);