On Mon, Feb 27, 2023 at 04:45:45PM +0800, Robert Hoo wrote: >LAM feature uses 2 high bits in CR3 (bit 62 for LAM_U48 and bit 61 for >LAM_U57) to enable/config LAM feature for user mode addresses. The LAM >masking is done before legacy canonical checks. > >To virtualize LAM CR3 bits usage, this patch: >1. Don't reserve these 2 bits when LAM is enable on the vCPU. Previously >when validate CR3, kvm uses kvm_vcpu_is_legal_gpa(), now define >kvm_vcpu_is_valid_cr3() which is actually kvm_vcpu_is_legal_gpa() >+ CR3.LAM bits validation. Substitutes kvm_vcpu_is_legal/illegal_gpa() >with kvm_vcpu_is_valid_cr3() in call sites where is validating CR3 rather >than pure GPA. >2. mmu::get_guest_pgd(), its implementation is get_cr3() which returns >whole guest CR3 value. Strip LAM bits in those call sites that need pure >PGD value, e.g. mmu_alloc_shadow_roots(), FNAME(walk_addr_generic)(). >3. When form a new guest CR3 (vmx_load_mmu_pgd()), melt in LAM bit >(kvm_get_active_lam()). >4. When guest sets CR3, identify ONLY-LAM-bits-toggling cases, where it is >unnecessary to make new pgd, but just make request of load pgd, then new >CR3.LAM bits configuration will be melt in (above point 3). To be >conservative, this case still do TLB flush. >5. For nested VM entry, allow the 2 CR3 bits set in corresponding >VMCS host/guest fields. isn't this already covered by item #1 above? > >Signed-off-by: Robert Hoo <robert.hu@xxxxxxxxxxxxxxx> >--- > arch/x86/kvm/mmu.h | 5 +++++ > arch/x86/kvm/mmu/mmu.c | 9 ++++++++- > arch/x86/kvm/mmu/paging_tmpl.h | 2 +- > arch/x86/kvm/vmx/nested.c | 4 ++-- > arch/x86/kvm/vmx/vmx.c | 3 ++- > arch/x86/kvm/x86.c | 33 ++++++++++++++++++++++++++++----- > arch/x86/kvm/x86.h | 1 + > 7 files changed, 47 insertions(+), 10 deletions(-) > >diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h >index 6bdaacb6faa0..866f2b7cb509 100644 >--- a/arch/x86/kvm/mmu.h >+++ b/arch/x86/kvm/mmu.h >@@ -142,6 +142,11 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu) > return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu)); > } > >+static inline u64 kvm_get_active_lam(struct kvm_vcpu *vcpu) >+{ >+ return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57); >+} I think it is better to define a mask (like reserved_gpa_bits): kvm_vcpu_arch { ... /* * Bits in CR3 used to enable certain features. These bits don't * participate in page table walking. They should be masked to * get the base address of page table. When shadow paging is * used, these bits should be kept as is in the shadow CR3. */ u64 cr3_control_bits; and initialize the mask in kvm_vcpu_after_set_cpuid(): if (guest_cpuid_has(X86_FEATURE_LAM)) vcpu->arch.cr3_control_bits = X86_CR3_LAM_U48 | X86_CR3_LAM_U57; then add helpers to extract/mask control bits. It is cleaner and can avoid looking up guest CPUID at runtime. And if AMD has a similar feature (e.g., some CR3 bits are used as control bits), it is easy to support that feature.