Treat monitor and mwait instructions as nop, which is architecturally correct (but inefficient) behavior. We do this to prevent misbehaving guests (e.g. OS X <= 10.7) from crashing after they fail to check for monitor/mwait availability via cpuid. Since mwait-based idle loops relying on these nop-emulated instructions would keep the host CPU pegged at 100%, do NOT advertise their presence via cpuid, to prevent compliant guests from using them inadvertently. Signed-off-by: Gabriel L. Somlo <somlo@xxxxxxx> --- New in v2: remove invalid_op handler functions which were only used to handle exits caused by monitor and mwait On Wed, May 07, 2014 at 08:31:27PM +0200, Alexander Graf wrote: > On 05/07/2014 08:15 PM, Michael S. Tsirkin wrote: > >If we really want to be paranoid and worry about guests > >that use this strange way to trigger invalid opcode, > >we can make it possible for userspace to enable/disable > >this hack, and teach qemu to set it. > > > >That would make it even safer than it was. > > > >Not sure it's worth it, just a thought. > > Since we don't trap on non-exposed other instructions (new SSE and > whatdoiknow) I don't think it's really bad to just expose > MONITOR/MWAIT as nops. So AFAICT, linux prefers to use mwait for idling if cpuid tells it that it's available. If we keep telling everyone that we do NOT have monitor and mwait available, compliant guests will never end up using them, and this hack would remain completely invisible to them, which is good (better to use hlt-based idle loops when you're a vm guest, that would actually allow the host to relax while you're halted :) So the only time anyone would be able to tell we have this hack would be when they're about to receive an invalid opcode for using monitor/mwait in violation of what CPUID (would have) told them. That's what happens to OS X prior to 10.8, which is when I'm hypothesizing the Apple devs begain to seriously think about their OS running as a vm guest (on fusion and parallels)... Instead of killing the misbehaving guest with an invalid opcode, we'd allow them to peg the host CPU with their monitor == mwait == nop idle loop instead, which, at least on OS X, should be tolerable long enough to run 'rm -rf System/Library/Extensions/AppleIntelCPUPowerManagement.kext' and reboot the guest, after which things would settle down by reverting the guest to a hlt-based idle loop. The only reason I can think of to add functionality for enabling/disabling this hack would be to protect against a malicious guest which would use mwait *on purpose* to peg the host CPU. But a malicious guest could just run "for(;;);" in ring 0 and accomplish the same goal, so we wouldn't really gain anything in exchange for the added complexity... Thanks, Gabriel arch/x86/kvm/cpuid.c | 2 ++ arch/x86/kvm/svm.c | 28 ++++++++++++++++++++-------- arch/x86/kvm/vmx.c | 20 ++++++++++++++++---- 3 files changed, 38 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index f47a104..d094fc6 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -283,6 +283,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); /* cpuid 1.ecx */ const u32 kvm_supported_word4_x86_features = + /* NOTE: MONITOR (and MWAIT) are emulated as NOP, + * but *not* advertised to guests via CPUID ! */ F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | 0 /* DS-CPL, VMX, SMX, EST */ | 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 7f4f9c2..0b7d58d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2770,12 +2770,6 @@ static int xsetbv_interception(struct vcpu_svm *svm) return 1; } -static int invalid_op_interception(struct vcpu_svm *svm) -{ - kvm_queue_exception(&svm->vcpu, UD_VECTOR); - return 1; -} - static int task_switch_interception(struct vcpu_svm *svm) { u16 tss_selector; @@ -3287,6 +3281,24 @@ static int pause_interception(struct vcpu_svm *svm) return 1; } +static int nop_interception(struct vcpu_svm *svm) +{ + skip_emulated_instruction(&(svm->vcpu)); + return 1; +} + +static int monitor_interception(struct vcpu_svm *svm) +{ + printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n"); + return nop_interception(svm); +} + +static int mwait_interception(struct vcpu_svm *svm) +{ + printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n"); + return nop_interception(svm); +} + static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_READ_CR0] = cr_interception, [SVM_EXIT_READ_CR3] = cr_interception, @@ -3344,8 +3356,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_CLGI] = clgi_interception, [SVM_EXIT_SKINIT] = skinit_interception, [SVM_EXIT_WBINVD] = emulate_on_interception, - [SVM_EXIT_MONITOR] = invalid_op_interception, - [SVM_EXIT_MWAIT] = invalid_op_interception, + [SVM_EXIT_MONITOR] = monitor_interception, + [SVM_EXIT_MWAIT] = mwait_interception, [SVM_EXIT_XSETBV] = xsetbv_interception, [SVM_EXIT_NPF] = pf_interception, }; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 33e8c02..3ccbcb1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5669,12 +5669,24 @@ static int handle_pause(struct kvm_vcpu *vcpu) return 1; } -static int handle_invalid_op(struct kvm_vcpu *vcpu) +static int handle_nop(struct kvm_vcpu *vcpu) { - kvm_queue_exception(vcpu, UD_VECTOR); + skip_emulated_instruction(vcpu); return 1; } +static int handle_mwait(struct kvm_vcpu *vcpu) +{ + printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n"); + return handle_nop(vcpu); +} + +static int handle_monitor(struct kvm_vcpu *vcpu) +{ + printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n"); + return handle_nop(vcpu); +} + /* * To run an L2 guest, we need a vmcs02 based on the L1-specified vmcs12. * We could reuse a single VMCS for all the L2 guests, but we also want the @@ -6571,8 +6583,8 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation, [EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig, [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause, - [EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op, - [EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op, + [EXIT_REASON_MWAIT_INSTRUCTION] = handle_mwait, + [EXIT_REASON_MONITOR_INSTRUCTION] = handle_monitor, [EXIT_REASON_INVEPT] = handle_invept, }; -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html