2018-04-10 20:15 GMT+08:00 KarimAllah Ahmed <karahmed@xxxxxxxxx>: > The VMX-preemption timer is used by KVM as a way to set deadlines for the > guest (i.e. timer emulation). That was safe till very recently when > capability KVM_X86_DISABLE_EXITS_MWAIT to disable intercepting MWAIT was > introduced. According to Intel SDM 25.5.1: > > """ > The VMX-preemption timer operates in the C-states C0, C1, and C2; it also > operates in the shutdown and wait-for-SIPI states. If the timer counts down > to zero in any state other than the wait-for SIPI state, the logical > processor transitions to the C0 C-state and causes a VM exit; the timer > does not cause a VM exit if it counts down to zero in the wait-for-SIPI > state. The timer is not decremented in C-states deeper than C2. > """ Thanks for the patch. In addition, does it also mean we should prevent host from entering deeper C-states than C2 even if w/o disable intercept stuffs? Regards, Wanpeng Li > > Now once the guest issues the MWAIT with a c-state deeper than > C2 the preemption timer will never wake it up again since it stopped > ticking! Usually this is compensated by other activities in the system that > would wake the core from the deep C-state (and cause a VMExit). For > example, if the host itself is ticking or it received interrupts, etc! > > So disable the VMX-preemption timer if MWAIT is exposed to the guest! > > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: H. Peter Anvin <hpa@xxxxxxxxx> > Cc: x86@xxxxxxxxxx > Cc: kvm@xxxxxxxxxxxxxxx > Cc: linux-kernel@xxxxxxxxxxxxxxx > Signed-off-by: KarimAllah Ahmed <karahmed@xxxxxxxxx> > --- > v2 -> v3: > - return -EOPNOTSUPP before any other operation in vmx_set_hv_timer > > v1 -> v2: > - Drop everything .. just return -EOPNOTSUPP (pbonzini@) :D > --- > arch/x86/kvm/vmx.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index d2e54e7..31a4204 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -11903,10 +11903,16 @@ static inline int u64_shl_div_u64(u64 a, unsigned int shift, > > static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc) > { > - struct vcpu_vmx *vmx = to_vmx(vcpu); > - u64 tscl = rdtsc(); > - u64 guest_tscl = kvm_read_l1_tsc(vcpu, tscl); > - u64 delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl; > + struct vcpu_vmx *vmx; > + u64 tscl, guest_tscl, delta_tsc; > + > + if (kvm_pause_in_guest(vcpu->kvm)) > + return -EOPNOTSUPP; > + > + vmx = to_vmx(vcpu); > + tscl = rdtsc(); > + guest_tscl = kvm_read_l1_tsc(vcpu, tscl); > + delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl; > > /* Convert to host delta tsc if tsc scaling is enabled */ > if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio && > -- > 2.7.4 >