On Wed, 2023-05-10 at 14:06 +0800, Binbin Wu wrote: > From: Robert Hoo <robert.hu@xxxxxxxxxxxxxxx> > > Add support to allow guests to set two new CR3 non-address control bits for > guests to enable the new Intel CPU feature Linear Address Masking (LAM) on user > pointers. > > LAM modifies the checking that is applied to 64-bit linear addresses, allowing > software to use of the untranslated address bits for metadata and masks the > metadata bits before using them as linear addresses to access memory. LAM uses > two new CR3 non-address bits LAM_U48 (bit 62) and AM_U57 (bit 61) to configure > LAM for user pointers. LAM also changes VMENTER to allow both bits to be set in > VMCS's HOST_CR3 and GUEST_CR3 for virtualization. > > When EPT is on, CR3 is not trapped by KVM and it's up to the guest to set any of > the two LAM control bits. However, when EPT is off, the actual CR3 used by the > guest is generated from the shadow MMU root which is different from the CR3 that > is *set* by the guest, and KVM needs to manually apply any active control bits > to VMCS's GUEST_CR3 based on the cached CR3 *seen* by the guest. > > KVM manually checks guest's CR3 to make sure it points to a valid guest physical > address (i.e. to support smaller MAXPHYSADDR in the guest). Extend this check > to allow the two LAM control bits to be set. And to make such check generic, > introduce a new field 'cr3_ctrl_bits' to vcpu to record all feature control bits > that are allowed to be set by the guest. After check, non-address bits of guest > CR3 will be stripped off to extract guest physical address. > > In case of nested, for a guest which supports LAM, both VMCS12's HOST_CR3 and > GUEST_CR3 are allowed to have the new LAM control bits set, i.e. when L0 enters > L1 to emulate a VMEXIT from L2 to L1 or when L0 enters L2 directly. KVM also > manually checks VMCS12's HOST_CR3 and GUEST_CR3 being valid physical address. > Extend such check to allow the new LAM control bits too. > > Note, LAM doesn't have a global control bit to turn on/off LAM completely, but > purely depends on hardware's CPUID to determine it can be enabled or not. That > means, when EPT is on, even when KVM doesn't expose LAM to guest, the guest can > still set LAM control bits in CR3 w/o causing problem. This is an unfortunate > virtualization hole. KVM could choose to intercept CR3 in this case and inject > fault but this would hurt performance when running a normal VM w/o LAM support. > This is undesirable. Just choose to let the guest do such illegal thing as the > worst case is guest being killed when KVM eventually find out such illegal > behaviour and that is the guest to blame. > > Opportunistically use GENMASK_ULL() to define __PT_BASE_ADDR_MASK. > Opportunistically use kvm_vcpu_is_legal_cr3() to check CR3 in SVM nested code, > to provide a clear distinction b/t CR3 and GPA checks. > > Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx> > Signed-off-by: Robert Hoo <robert.hu@xxxxxxxxxxxxxxx> > Co-developed-by: Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx> > Signed-off-by: Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx> > Tested-by: Xuelian Guo <xuelian.guo@xxxxxxxxx> > [...] > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3812,7 +3812,13 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) > hpa_t root; > > root_pgd = kvm_mmu_get_guest_pgd(vcpu, mmu); > - root_gfn = root_pgd >> PAGE_SHIFT; > + /* > + * Guest PGD can be CR3 or EPTP (for nested EPT case). CR3 may contain > + * additional control bits (e.g. LAM control bits). To be generic, > + * unconditionally strip non-address bits when computing the GFN since > + * the guest PGD has already been checked for validity. > + */ > + root_gfn = (root_pgd & __PT_BASE_ADDR_MASK) >> PAGE_SHIFT; Looks it's better to mask off non-address bits based on guest's invalid physical address bits, for instance, based on vcpu->arch.resereved_gpa_bits. But looking at the old discussion it appears Sean suggested this way: https://lore.kernel.org/all/ZAuPPv8PUDX2RBQa@xxxxxxxxxx/ So anyway: Reviewed-by: Kai Huang <kai.huang@xxxxxxxxx>