У пт, 2024-07-26 у 14:50 +0800, Chao Gao пише: > On Thu, Jul 25, 2024 at 11:01:09AM -0400, Maxim Levitsky wrote: > > Several architectural msrs (e.g MSR_KERNEL_GS_BASE) must contain > > a canonical address, and according to Intel PRM, this is enforced > > by #GP on a MSR write. > > > > However with the introduction of the LA57 the definition of > > what is a canonical address became blurred. > > > > Few tests done on Sapphire Rapids CPU and on Zen4 CPU, > > reveal: > > > > 1. These CPUs do allow full 57-bit wide non canonical values > > to be written to MSR_GS_BASE, MSR_FS_BASE, MSR_KERNEL_GS_BASE, > > regardless of the state of CR4.LA57. > > Zen4 in addition to that even allows such writes to > > MSR_CSTAR and MSR_LSTAR. > > This actually is documented/implied at least in ISE [1]. In Chapter 6.4 > "CANONICALITY CHECKING FOR DATA ADDRESSES WRITTEN TO CONTROL REGISTERS AND > MSRS" > > In Processors that support LAM continue to require the addresses written to > control registers or MSRs to be 57-bit canonical if the processor _supports_ > 5-level paging or 48-bit canonical if it supports only 4-level paging > > [1]: https://cdrdv2.intel.com/v1/dl/getContent/671368 I haven't found this in the actual PRM, but mine is relatively old, (from September 2023, I didn't bother to update it because 5 level paging is quite an old feature) > > > > > 2. These CPUs don't prevent the user from switching back to 4 level > > paging with values that will be non canonical in 4 level paging, > > and instead just allow the msrs to contain these values. > > > > Since these MSRS are all passed through to the guest, and microcode > > allows the non canonical values to get into these msrs, > > KVM has to tolerate such values and avoid crashing the guest. > > > > To do so, always allow the host initiated values regardless of > > the state of CR4.LA57, instead only gate this by the actual hardware > > support for 5 level paging. > > > > To be on the safe side leave the check for guest writes as is. > > > > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > > --- > > arch/x86/kvm/x86.c | 31 ++++++++++++++++++++++++++++++- > > 1 file changed, 30 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index a6968eadd418..c599deff916e 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -1844,7 +1844,36 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data, > > case MSR_KERNEL_GS_BASE: > > case MSR_CSTAR: > > case MSR_LSTAR: > > - if (is_noncanonical_address(data, vcpu)) > > + > > + /* > > + * Both AMD and Intel cpus tend to allow values which > > + * are canonical in the 5 level paging mode but are not > > + * canonical in the 4 level paging mode to be written > > + * to the above msrs, regardless of the state of the CR4.LA57. > > + * > > + * Intel CPUs do honour CR4.LA57 for the MSR_CSTAR/MSR_LSTAR, > > + * AMD cpus don't even do that. > > + * > > + * Both CPUs also allow non canonical values to remain in > > + * these MSRs if the CPU was in 5 level paging mode and was > > + * switched back to 4 level paging, and tolerate these values > > + * both in native MSRs and in vmcs/vmcb fields. > > + * > > + * To avoid crashing a guest, which manages using one of the above > > + * tricks to get non canonical value to one of > > + * these MSRs, and later migrates, allow the host initiated > > + * writes regardless of the state of CR4.LA57. > > + * > > + * To be on the safe side, don't allow the guest initiated > > + * writes to bypass the canonical check (e.g be more strict > > + * than what the actual ucode usually does). > > I may think guest-initiated writes should be allowed as well because this is > the architectural behavior. Note though that for MSR_CSTAR/MSR_LSTAR I did set #GP, depending on CR.LA57. Ah, I see it, KVM intercepts these msrs on VMX (but not on SVM) and I was under the impression that it doesn't, that is why I get #GP depending on CR4.LA57.... I do wonder why we intercept these msrs on VMX and not on SVM. It all makes sense now, thanks a lot for the explanation! > > > + */ > > + > > + if (!host_initiated && is_noncanonical_address(data, vcpu)) > > + return 1; > > + > > + if (!__is_canonical_address(data, > > + boot_cpu_has(X86_FEATURE_LA57) ? 57 : 48)) > > boot_cpu_has(X86_FEATURE_LA57)=1 means LA57 is enabled. Right? > > With this change, host-initiated writes must be 48-bit canonical if LA57 isn't > enabled on the host, even if it is enabled in the guest. (note that KVM can > expose LA57 to guests even if LA57 is disabled on the host, see > kvm_set_cpu_caps()). Sorry about this - we indeed need to use kvm_cpu_cap_has(X86_FEATURE_LA57) because it is forced based on host raw CPUID. I remember I wanted to do exactly this but forgot somehow. Also I need to update this for nested VMX - these msrs are also checked there based on CR4.LA57. Thanks for the clarification, and it all makes sense. I'll send v2 with all of this when I get back from a vacation (next Wednesday). Best regards, Maxim Levitsky > > > return 1; >