Re: [PATCH] x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 06, 2020 at 10:51:06AM +0800, Xiaoyao Li wrote:
> On 6/6/2020 3:26 AM, Sean Christopherson wrote:
> >Choo! Choo!  All aboard the Split Lock Express, with direct service to
> >Wreckage!
> >
> >Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible
> >SLD-enabled CPU model to avoid writing MSR_TEST_CTRL.  MSR_TEST_CTRL
> >exists, and is writable, on many generations of CPUs.  Writing the MSR,
> >even with '0', can result in bizarre, undocumented behavior.
> >
> >This fixes a crash on Haswell when resuming from suspend with a live KVM
> >guest.  Because APs use the standard SMP boot flow for resume, they will
> >go through split_lock_init() and the subsequent RDMSR/WRMSR sequence,
> >which runs even when sld_state==sld_off to ensure SLD is disabled.  On
> >Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will
> >succeed and _may_ take the SMT _sibling_ out of VMX root mode.
> >
> >When KVM has an active guest, KVM performs VMXON as part of CPU onlining
> >(see kvm_starting_cpu()).  Because SMP boot is serialized, the resulting
> >flow is effectively:
> >
> >   on_each_ap_cpu() {
> >      WRMSR(MSR_TEST_CTRL, 0)
> >      VMXON
> >   }
> >
> >As a result, the WRMSR can disable VMX on a different CPU that has
> >already done VMXON.  This ultimately results in a #UD on VMPTRLD when
> >KVM regains control and attempt run its vCPUs.
> >
> >The above voodoo was confirmed by reworking KVM's VMXON flow to write
> >MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above.
> >Further verification of the insanity was done by redoing VMXON on all
> >APs after the initial WRMSR->VMXON sequence.  The additional VMXON,
> >which should VM-Fail, occasionally succeeded, and also eliminated the
> >unexpected #UD on VMPTRLD.
> >
> >The damage done by writing MSR_TEST_CTRL doesn't appear to be limited
> >to VMX, e.g. after suspend with an active KVM guest, subsequent reboots
> >almost always hang (even when fudging VMXON), a #UD on a random Jcc was
> >observed, suspend/resume stability is qualitatively poor, and so on and
> >so forth.
> >
> 
> I'm wondering if all those side-effects of MSR_TEST_CTRL exist on CPUs have
> SLD feature, have you ever tested on a SLD capable CPU?

No, I'll poke at it on ICX tomorrow.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux