Re: [PATCH v7 2/5] KVM: x86: Virtualize CR3.LAM_{U48,U57}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2023-04-22 at 12:43 +0800, Gao, Chao wrote:
> On Sat, Apr 22, 2023 at 11:32:26AM +0800, Binbin Wu wrote:
> > Kai,
> > 
> > Thanks for your inputs.
> > 
> > I rephrased the changelog as following:
> > 
> > 
> > LAM uses CR3.LAM_U48 (bit 62) and CR3.LAM_U57 (bit 61) to configure LAM
> > masking for user mode pointers.
> > 
> > To support LAM in KVM, CR3 validity checks and shadow paging handling need to
> > be
> > modified accordingly.
> > 
> > == CR3 validity Check ==
> > When LAM is supported, CR3 LAM bits are allowed to be set and the check of
> > CR3
> > needs to be modified.
> 
> it is better to describe the hardware change here:
> 
> On processors that enumerate support for LAM, CR3 register allows
> LAM_U48/U57 to be set and VM entry allows LAM_U48/U57 to be set in both
> GUEST_CR3 and HOST_CR3 fields.
> 
> To emulate LAM hardware behavior, KVM needs to
> 1. allow LAM_U48/U57 to be set to the CR3 register by guest or userspace
> 2. allow LAM_U48/U57 to be set to the GUES_CR3/HOST_CR3 fields in vmcs12

Agreed.  This is more clearer.

> 
> > Add a helper kvm_vcpu_is_legal_cr3() and use it instead of
> > kvm_vcpu_is_legal_gpa()
> > to do the new CR3 checks in all existing CR3 checks as following:
> > When userspace sets sregs, CR3 is checked in kvm_is_valid_sregs().
> > Non-nested case
> > - When EPT on, CR3 is fully under control of guest.
> > - When EPT off, CR3 is intercepted and CR3 is checked in kvm_set_cr3() during
> >   CR3 VMExit handling.
> > Nested case, from L0's perspective, we care about:
> > - L1's CR3 register (VMCS01's GUEST_CR3), it's the same as non-nested case.
> > - L1's VMCS to run L2 guest (i.e. VMCS12's HOST_CR3 and VMCS12's GUEST_CR3)
> >   Two paths related:
> >   1. L0 emulates a VMExit from L2 to L1 using VMCS01 to reflect VMCS12
> >          nested_vm_exit()
> >          -> load_vmcs12_host_state()
> >                -> nested_vmx_load_cr3()     //check VMCS12's HOST_CR3
> 
> This is just a byproduct of using a unified function, i.e.,
> nested_vmx_load_cr3() to load CR3 for both nested VM entry and VM exit.
> 
> LAM spec says:
> 
> VM entry checks the values of the CR3 and CR4 fields in the guest-area
> and host-state area of the VMCS. In particular, the bits in these fields
> that correspond to bits reserved in the corresponding register are
> checked and must be 0.
> 
> It doesn't mention any check on VM exit. So, it looks to me that VM exit
> doesn't do consistency checks. Then, I think there is no need to call
> out this path.

But this isn't a true VMEXIT -- it is indeed a VMENTER from L0 to L1 using
VMCS01 but with an environment that allows L1 to run its VMEXIT handler just
like it received a VMEXIT from L2.

However I fully agree those code paths are details and shouldn't be changelog
material.

How about below changelog? 

Add support to allow guest to set two new CR3 non-address control bits to allow
guest to enable the new Intel CPU feature Linear Address Masking (LAM).

LAM modifies the checking that is applied to 64-bit linear addresses, allowing
software to use of the untranslated address bits for metadata.  For user mode
linear address, LAM uses two new CR3 non-address bits LAM_U48 (bit 62) and
LAM_U57 (bit 61) to configure the metadata bits for 4-level paging and 5-level
paging respectively.  LAM also changes VMENTER to allow both bits to be set in
VMCS's HOST_CR3 and GUEST_CR3 to support virtualization.

When EPT is on, CR3 is not trapped by KVM and it's up to the guest to set any of
the two LAM control bits.  However when EPT is off, the actual CR3 used by the
guest is generated from the shadow MMU root which is different from the CR3 that
is *set* by the guest, and KVM needs to manually apply any active control bits
to VMCS's GUEST_CR3 based on the cached CR3 *seen* by the guest.

KVM manually checks guest's CR3 to make sure it points to a valid guest physical
address (i.e. to support smaller MAXPHYSADDR in the guest).  Extend this check
to allow the two LAM control bits to be set.  And to make such check generic,
introduce a new field 'cr3_ctrl_bits' to vcpu to record all feature control bits
that are allowed to be set by the guest.

In case of nested, for a guest which supports LAM, both VMCS12's HOST_CR3 and
GUEST_CR3 are allowed to have the new LAM control bits set, i.e. when L0 enters
L1 to emulate a VMEXIT from L2 to L1 or when L0 enters L2 directly.  KVM also
manually checks VMCS12's HOST_CR3 and GUEST_CR3 being valid physical address.
Extend such check to allow the new LAM control bits too.

Note, LAM doesn't have a global enable bit in any control register to turn
on/off LAM completely, but purely depends on hardware's CPUID to determine
whether to perform LAM check or not.  That means, when EPT is on, even when KVM
doesn't expose LAM to guest, the guest can still set LAM control bits in CR3 w/o
causing problem.  This is an unfortunate virtualization hole.  KVM could choose
to intercept CR3 in this case and inject fault but this would hurt performance
when running a normal VM w/o LAM support.  This is undesirable.  Just choose to
let the guest do such illegal thing as the worst case is guest being killed when
KVM eventually find out such illegal behaviour and that is the guest to blame. 





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux