Re: [PATCH v5 3/5] KVM: x86: Virtualize CR3.LAM_{U48,U57}

Chao Gao <chao.gao@xxxxxxxxx> · Fri, 3 Mar 2023 14:21:10 +0800

On Mon, Feb 27, 2023 at 04:45:45PM +0800, Robert Hoo wrote:
>LAM feature uses 2 high bits in CR3 (bit 62 for LAM_U48 and bit 61 for
>LAM_U57) to enable/config LAM feature for user mode addresses. The LAM
>masking is done before legacy canonical checks.
>
>To virtualize LAM CR3 bits usage, this patch:
>1. Don't reserve these 2 bits when LAM is enable on the vCPU. Previously
>when validate CR3, kvm uses kvm_vcpu_is_legal_gpa(), now define
>kvm_vcpu_is_valid_cr3() which is actually kvm_vcpu_is_legal_gpa()
>+ CR3.LAM bits validation. Substitutes kvm_vcpu_is_legal/illegal_gpa()
>with kvm_vcpu_is_valid_cr3() in call sites where is validating CR3 rather
>than pure GPA.
>2. mmu::get_guest_pgd(), its implementation is get_cr3() which returns
>whole guest CR3 value. Strip LAM bits in those call sites that need pure
>PGD value, e.g. mmu_alloc_shadow_roots(), FNAME(walk_addr_generic)().
>3. When form a new guest CR3 (vmx_load_mmu_pgd()), melt in LAM bit
>(kvm_get_active_lam()).
>4. When guest sets CR3, identify ONLY-LAM-bits-toggling cases, where it is
>unnecessary to make new pgd, but just make request of load pgd, then new
>CR3.LAM bits configuration will be melt in (above point 3). To be
>conservative, this case still do TLB flush.

>5. For nested VM entry, allow the 2 CR3 bits set in corresponding
>VMCS host/guest fields.

isn't this already covered by item #1 above?

>
>Signed-off-by: Robert Hoo <robert.hu@xxxxxxxxxxxxxxx>
>---
> arch/x86/kvm/mmu.h             |  5 +++++
> arch/x86/kvm/mmu/mmu.c         |  9 ++++++++-
> arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
> arch/x86/kvm/vmx/nested.c      |  4 ++--
> arch/x86/kvm/vmx/vmx.c         |  3 ++-
> arch/x86/kvm/x86.c             | 33 ++++++++++++++++++++++++++++-----
> arch/x86/kvm/x86.h             |  1 +
> 7 files changed, 47 insertions(+), 10 deletions(-)
>
>diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>index 6bdaacb6faa0..866f2b7cb509 100644
>--- a/arch/x86/kvm/mmu.h
>+++ b/arch/x86/kvm/mmu.h
>@@ -142,6 +142,11 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
> 	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
> }
> 
>+static inline u64 kvm_get_active_lam(struct kvm_vcpu *vcpu)
>+{
>+	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
>+}

I think it is better to define a mask (like reserved_gpa_bits):

kvm_vcpu_arch {
	...

	/*
	 * Bits in CR3 used to enable certain features. These bits don't
	 * participate in page table walking. They should be masked to
	 * get the base address of page table. When shadow paging is
	 * used, these bits should be kept as is in the shadow CR3.
	 */
	u64 cr3_control_bits;

and initialize the mask in kvm_vcpu_after_set_cpuid():

	if (guest_cpuid_has(X86_FEATURE_LAM))
		vcpu->arch.cr3_control_bits = X86_CR3_LAM_U48 | X86_CR3_LAM_U57;

then add helpers to extract/mask control bits.

It is cleaner and can avoid looking up guest CPUID at runtime. And if
AMD has a similar feature (e.g., some CR3 bits are used as control bits),
it is easy to support that feature.