On 23/06/20 02:51, Sean Christopherson wrote: > Remove support for context switching between the guest's and host's > desired UMWAIT_CONTROL. Propagating the guest's value to hardware isn't > required for correct functionality, e.g. KVM intercepts reads and writes > to the MSR, and the latency effects of the settings controlled by the > MSR are not architecturally visible. > > As a general rule, KVM should not allow the guest to control power > management settings unless explicitly enabled by userspace, e.g. see > KVM_CAP_X86_DISABLE_EXITS. E.g. Intel's SDM explicitly states that C0.2 > can improve the performance of SMT siblings. A devious guest could > disable C0.2 so as to improve the performance of their workloads at the > detriment to workloads running in the host or on other VMs. > > Wholesale removal of UMWAIT_CONTROL context switching also fixes a race > condition where updates from the host may cause KVM to enter the guest > with the incorrect value. Because updates are are propagated to all > CPUs via IPI (SMP function callback), the value in hardware may be > stale with respect to the cached value and KVM could enter the guest > with the wrong value in hardware. As above, the guest can't observe the > bad value, but it's a weird and confusing wart in the implementation. > > Removal also fixes the unnecessary usage of VMX's atomic load/store MSR > lists. Using the lists is only necessary for MSRs that are required for > correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on > old hardware, or for MSRs that need to-the-uop precision, e.g. perf > related MSRs. For UMWAIT_CONTROL, the effects are only visible in the > kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in > vcpu_vmx_run(). Using the atomic lists is undesirable as they are more > expensive than direct RDMSR/WRMSR. > > Furthermore, even if giving the guest control of the MSR is legitimate, > e.g. in pass-through scenarios, it's not clear that the benefits would > outweigh the overhead. E.g. saving and restoring an MSR across a VMX > roundtrip costs ~250 cycles, and if the guest diverged from the host > that cost would be paid on every run of the guest. In other words, if > there is a legitimate use case then it should be enabled by a new > per-VM capability. > > Note, KVM still needs to emulate MSR_IA32_UMWAIT_CONTROL so that it can > correctly expose other WAITPKG features to the guest, e.g. TPAUSE, > UMWAIT and UMONITOR. > > Fixes: 6e3ba4abcea56 ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") > Cc: stable@xxxxxxxxxxxxxxx > Cc: Jingqi Liu <jingqi.liu@xxxxxxxxx> > Cc: Tao Xu <tao3.xu@xxxxxxxxx> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > --- > arch/x86/include/asm/mwait.h | 2 -- > arch/x86/kernel/cpu/umwait.c | 6 ------ > arch/x86/kvm/vmx/vmx.c | 18 ------------------ > 3 files changed, 26 deletions(-) > > diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h > index 73d997aa2966..e039a933aca3 100644 > --- a/arch/x86/include/asm/mwait.h > +++ b/arch/x86/include/asm/mwait.h > @@ -25,8 +25,6 @@ > #define TPAUSE_C01_STATE 1 > #define TPAUSE_C02_STATE 0 > > -u32 get_umwait_control_msr(void); > - > static inline void __monitor(const void *eax, unsigned long ecx, > unsigned long edx) > { > diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c > index 300e3fd5ade3..ec8064c0ae03 100644 > --- a/arch/x86/kernel/cpu/umwait.c > +++ b/arch/x86/kernel/cpu/umwait.c > @@ -18,12 +18,6 @@ > */ > static u32 umwait_control_cached = UMWAIT_CTRL_VAL(100000, UMWAIT_C02_ENABLE); > > -u32 get_umwait_control_msr(void) > -{ > - return umwait_control_cached; > -} > -EXPORT_SYMBOL_GPL(get_umwait_control_msr); > - > /* > * Cache the original IA32_UMWAIT_CONTROL MSR value which is configured by > * hardware or BIOS before kernel boot. > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 08e26a9518c2..b2447c1ee362 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -6606,23 +6606,6 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx) > msrs[i].host, false); > } > > -static void atomic_switch_umwait_control_msr(struct vcpu_vmx *vmx) > -{ > - u32 host_umwait_control; > - > - if (!vmx_has_waitpkg(vmx)) > - return; > - > - host_umwait_control = get_umwait_control_msr(); > - > - if (vmx->msr_ia32_umwait_control != host_umwait_control) > - add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL, > - vmx->msr_ia32_umwait_control, > - host_umwait_control, false); > - else > - clear_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL); > -} > - > static void vmx_update_hv_timer(struct kvm_vcpu *vcpu) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > @@ -6730,7 +6713,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) > > if (vcpu_to_pmu(vcpu)->version) > atomic_switch_perf_msrs(vmx); > - atomic_switch_umwait_control_msr(vmx); > > if (enable_preemption_timer) > vmx_update_hv_timer(vcpu); > Queued, thanks. Paolo