Re: [PATCH v9 3/6] KVM: x86: Virtualize CR3.LAM_{U48,U57}

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 27 Jun 2023 16:40:37 -0700

On Tue, Jun 06, 2023, Binbin Wu wrote:
> Opportunistically use GENMASK_ULL() to define __PT_BASE_ADDR_MASK.

This are not the type of changes to do opportunstically.   Opportunstic changes
are things like fixing comment typos, dropping unnecessary semi-colons, fixing
coding styles violations, etc.

> Opportunistically use kvm_vcpu_is_legal_cr3() to check CR3 in SVM nested code,
> to provide a clear distinction b/t CR3 and GPA checks.

This *shouldn't* be an opportunsitic thing.  That you felt compelled to call it
out is a symptom of this patch doing too much.

In short, split this into three patches:

  1. Do the __PT_BASE_ADDR_MASK() changes
  2. Add and use kvm_vcpu_is_legal_cr3()
  3. Add support for CR3.LAM bits

> Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> Signed-off-by: Robert Hoo <robert.hu@xxxxxxxxxxxxxxx>
> Co-developed-by: Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx>
> Signed-off-by: Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx>
> Tested-by: Xuelian Guo <xuelian.guo@xxxxxxxxx>
> Reviewed-by: Kai Huang <kai.huang@xxxxxxxxx>
> Reviewed-by: Chao Gao <chao.gao@xxxxxxxxx>
> ---
>  arch/x86/include/asm/kvm_host.h | 5 +++++
>  arch/x86/kvm/cpuid.h            | 5 +++++
>  arch/x86/kvm/mmu.h              | 5 +++++
>  arch/x86/kvm/mmu/mmu.c          | 8 +++++++-
>  arch/x86/kvm/mmu/mmu_internal.h | 1 +
>  arch/x86/kvm/mmu/paging_tmpl.h  | 3 ++-
>  arch/x86/kvm/mmu/spte.h         | 2 +-
>  arch/x86/kvm/svm/nested.c       | 4 ++--
>  arch/x86/kvm/vmx/nested.c       | 4 ++--
>  arch/x86/kvm/vmx/vmx.c          | 8 +++++++-
>  arch/x86/kvm/x86.c              | 4 ++--
>  11 files changed, 39 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c6f03d151c31..46471dd9cc1b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -727,6 +727,11 @@ struct kvm_vcpu_arch {
>  	unsigned long cr0_guest_owned_bits;
>  	unsigned long cr2;
>  	unsigned long cr3;
> +	/*
> +	 * CR3 non-address feature control bits.
> +	 * Guest CR3 may contain any of those bits at runtime.
> +	 */
> +	u64 cr3_ctrl_bits;

This should be an "unsigned long".

Hmm, "ctrl_bits" is unnecessarily generic at this point.  It's also arguably wrong,
because X86_CR3_PCID_NOFLUSH is also a control bit, it's just allowed in CR3 itself.

I think I'd prefer to drop this field and avoid bikeshedding the name entirely.  The
only reason to effectively cache "X86_CR3_LAM_U48 | X86_CR3_LAM_U57" is because
guest_cpuid_has() is slow, and I'd rather solve that problem with the "governed
feature" framework.

More below.

>  	unsigned long cr4;
>  	unsigned long cr4_guest_owned_bits;
>  	unsigned long cr4_guest_rsvd_bits;
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index b1658c0de847..ef8e1b912d7d 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -42,6 +42,11 @@ static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.maxphyaddr;
>  }
>  
> +static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)

Heh, I think it makes sense to wrap this one.  I'll probably tell you differently
tomorrow, but today, let's wrap.

> +{
> +	return !((cr3 & vcpu->arch.reserved_gpa_bits) & ~vcpu->arch.cr3_ctrl_bits);

Don't open code something for which there is a perfect helper, i.e. use
kvm_vcpu_is_legal_gpa().

If we go the governed feature route, this becomes:

static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu,
					 unsigned long cr3)
{
	if (guest_can_use(vcpu, X86_FEATURE_LAM))
		cr3 &= ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);

	return kvm_vcpu_is_legal_gpa(cr3);
}

> +}
> +
>  static inline bool kvm_vcpu_is_legal_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
>  {
>  	return !(gpa & vcpu->arch.reserved_gpa_bits);
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 92d5a1924fc1..81d8a433dae1 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -144,6 +144,11 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
>  	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
>  }
>  
> +static inline u64 kvm_get_active_cr3_ctrl_bits(struct kvm_vcpu *vcpu)

And then this becomes:

static inline u64 kvm_get_active_cr3_lam_bits(struct kvm_vcpu *vcpu)
{
	if (!guest_can_use(vcpu, X86_FEATURE_LAM))
		return 0;

	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
}

> +{
> +	return kvm_read_cr3(vcpu) & vcpu->arch.cr3_ctrl_bits;
> +}
> +
>  static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
>  {
>  	u64 root_hpa = vcpu->arch.mmu->root.hpa;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c8961f45e3b1..deea9a9f0c75 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3812,7 +3812,13 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>  	hpa_t root;
>  
>  	root_pgd = kvm_mmu_get_guest_pgd(vcpu, mmu);
> -	root_gfn = root_pgd >> PAGE_SHIFT;
> +	/*
> +	 * Guest PGD can be CR3 or EPTP (for nested EPT case). CR3 may contain
> +	 * additional control bits (e.g. LAM control bits). To be generic,
> +	 * unconditionally strip non-address bits when computing the GFN since
> +	 * the guest PGD has already been checked for validity.
> +	 */

Drop this comment, the code is self-explanatory, and the comment is incomplete,
e.g. it can also be nCR3.

> +	root_gfn = (root_pgd & __PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
>  
>  	if (mmu_check_root(vcpu, root_gfn))
>  		return 1;
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index d39af5639ce9..7d2105432d66 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -21,6 +21,7 @@ extern bool dbg;
>  #endif
>  
>  /* Page table builder macros common to shadow (host) PTEs and guest PTEs. */
> +#define __PT_BASE_ADDR_MASK GENMASK_ULL(51, 12)
>  #define __PT_LEVEL_SHIFT(level, bits_per_level)	\
>  	(PAGE_SHIFT + ((level) - 1) * (bits_per_level))
>  #define __PT_INDEX(address, level, bits_per_level) \
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 0662e0278e70..394733ac9088 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -62,7 +62,7 @@
>  #endif
>  
>  /* Common logic, but per-type values.  These also need to be undefined. */
> -#define PT_BASE_ADDR_MASK	((pt_element_t)(((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)))
> +#define PT_BASE_ADDR_MASK	((pt_element_t)__PT_BASE_ADDR_MASK)
>  #define PT_LVL_ADDR_MASK(lvl)	__PT_LVL_ADDR_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS)
>  #define PT_LVL_OFFSET_MASK(lvl)	__PT_LVL_OFFSET_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS)
>  #define PT_INDEX(addr, lvl)	__PT_INDEX(addr, lvl, PT_LEVEL_BITS)
> @@ -324,6 +324,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  	trace_kvm_mmu_pagetable_walk(addr, access);
>  retry_walk:
>  	walker->level = mmu->cpu_role.base.level;
> +	/* gpte_to_gfn() will strip non-address bits. */

Drop this comment too, it's not relevant to the immediate code, i.e. it'd be
better suited about this code:

	table_gfn = gpte_to_gfn(pte);

but IMO that code is quite self-explanatory too.

> @@ -7740,6 +7741,11 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  		vmx->msr_ia32_feature_control_valid_bits &=
>  			~FEAT_CTL_SGX_LC_ENABLED;
>  
> +	if (guest_cpuid_has(vcpu, X86_FEATURE_LAM))

This is wrong, KVM needs to check that the host supports LAM too, otherwise KVM
will allow userspace to shove garbage into guest CR3 and induce VM-Entry failures
and whatnot.  If we go the guest_can_use() route, this problem solves itself.