Re: [PATCH v2 1/2] KVM: x86: relax canonical check for some x86 architectural msrs

mlevitsk@xxxxxxxxxx · Mon, 05 Aug 2024 21:07:10 +0300

У пн, 2024-08-05 у 09:39 -0700, Sean Christopherson пише:
> On Mon, Aug 05, 2024, mlevitsk@xxxxxxxxxx wrote:
> > У пт, 2024-08-02 у 08:53 -0700, Sean Christopherson пише:
> > > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > > > > > index a6968eadd418..3582f0bb7644 100644
> > > > > > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > > > > > @@ -1844,7 +1844,16 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
> > > > > > > > > >         case MSR_KERNEL_GS_BASE:
> > > > > > > > > >         case MSR_CSTAR:
> > > > > > > > > >         case MSR_LSTAR:
> > > > > > > > > > -               if (is_noncanonical_address(data, vcpu))
> > > > > > > > > > +
> > > > > > > > > > +               /*
> > > > > > > > > > +                * Both AMD and Intel cpus allow values which
> > > > > > > > > > +                * are canonical in the 5 level paging mode but are not
> > > > > > > > > > +                * canonical in the 4 level paging mode to be written
> > > > > > > > > > +                * to the above MSRs, as long as the host CPU supports
> > > > > > > > > > +                * 5 level paging, regardless of the state of the CR4.LA57.
> > > > > > > > > > +                */
> > > > > > > > > > +               if (!__is_canonical_address(data,
> > > > > > > > > > +                       kvm_cpu_cap_has(X86_FEATURE_LA57) ? 57 : 48))
> > > > > > 
> > > > > > Please align indentation.
> > > > > > 
> > > > > > Checking kvm_cpu_cap_has() is wrong.  What the _host_ supports is irrelevant,
> > > > > > what matters is what the guest CPU supports, i.e. this should check guest CPUID.
> > > > > > Ah, but for safety, KVM also needs to check kvm_cpu_cap_has() to prevent faulting
> > > > > > on a bad load into hardware.  Which means adding a "governed" feature until my
> > > > > > CPUID rework lands.
> > 
> > Well the problem is that we passthrough these MSRs, and that means that the guest
> > can modify them at will, and only ucode can prevent it from doing so.
> > 
> > So even if the 5 level paging is disabled in the guest's CPUID, but host supports it,
> > nothing will prevent the guest to write non canonical value to one of those MSRs, 
> > and later KVM during migration or just KVM_SET_SREGS will fail.
>  
> Ahh, and now I recall the discussions around the virtualization holes with LA57.
> 
> > Thus I used kvm_cpu_cap_has on purpose to make KVM follow the actual ucode
> > behavior.
> 
> I'm leaning towards having KVM do the right thing when emulation happens to be
> triggered.  If KVM checks kvm_cpu_cap_has() instead of guest_cpu_cap_has() (looking
> at the future), then KVM will extend the virtualization hole to MSRs that are
> never passed through, and also to the nested VMX checks.  Or I suppose we could
> add separate helpers for passthrough MSRs vs. non-passthrough, but that seems
> like it'd add very little value and a lot of maintenance burden.
> 
> Practically speaking, outside of tests, I can't imagine the guest will ever care
> if there is inconsistent behavior with respect to loading non-canonical values
> into MSRs.
> 

Hi,

If we weren't allowing the guest (and even nested guest assuming that L1 hypervisor allows it) to write
these MSRs directly, I would have agreed with you, but we do allow this.

This means that for example a L2, even a malicious L2, can on purpose write non canonical value to one of these
MSRs, and later on, KVM could kill the L0 due to canonical check.

Or L1 (not Linux, because it only lets canonical GS_BASE/FS_BASE), allow the untrusted userspace to 
write any value to say GS_BASE, thus allowing
malicious L1 userspace to crash L1 (also a security violation).

IMHO if we really want to do it right, we need to disable pass-though of these MSRs if ucode check is more lax than
our check, that is if L1 is running without 5 level paging enabled but L0 does have it supported.

I don't know if this configuration is common, and thus how much this will affect performance.

Best regards,
	Maxim Levitsky