On Jun 4, 2014, at 10:43 PM, Gabriel L. Somlo <gsomlo@xxxxxxxxx> wrote: My implementation still emulates the instruction as a NOP, but first checks for an exception. > On Wed, Jun 04, 2014 at 10:12:39PM +0300, Nadav Amit wrote: > > I'd be curious how you're dealing with the "hidden" CPU state which > tells MWAIT to sleep until someone or something writes to the > monitored memory area set up by a corresponding MONITOR instruction. >> Regardless to the whole discussion of what the guest is informed about, I think it might be better to implement mwait and monitor correctly according to the spec and let the instructions to be fully emulated. >> Both mwait and monitor may encounter exceptions (#GP, #PF, regardless of #UD), so this behaviour should be correct. >> If you want me, I?ll send my version which looks something like: >> >> static int em_monitor(struct x86_emulate_ctxt *ctxt) >> { >> int rc; >> struct segmented_address addr; >> u64 rcx = reg_read(ctxt, VCPU_REGS_RCX); >> u64 rax = reg_read(ctxt, VCPU_REGS_RAX); >> u8 byte; >> >> rc = check_mwait_supported(ctxt); >> if (rc != X86EMUL_CONTINUE) >> return rc; >> >> if (ctxt->mode != X86EMUL_MODE_PROT64) >> rcx = (u32)rcx; >> >> if (rcx != 0) >> return emulate_gp(ctxt, 0); >> >> addr.seg = seg_override(ctxt); >> addr.ea = ctxt->ad_bytes == 8 ? rax : (u32)rax; >> >> rc = segmented_read(ctxt, addr, &byte, 1); >> if (rc != X86EMUL_CONTINUE) >> return rc; >> >> return X86EMUL_CONTINUE; >> } >> >> static int em_mwait(struct x86_emulate_ctxt *ctxt) >> { >> u64 rcx = reg_read(ctxt, VCPU_REGS_RCX); >> int rc = check_mwait_supported(ctxt); >> if (rc != X86EMUL_CONTINUE) >> return rc; >> if (ctxt->mode != X86EMUL_MODE_PROT64) >> rcx = (u32)rcx; >> >> if ((rcx & ~(u64)1) != 0) >> return emulate_gp(ctxt, 0); >> >> if (rcx & 1) { >> /* Interrupt as break event */ >> u32 ebx, ecx, edx, eax; >> eax = 5; >> ecx = 0; >> ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx); >> if (!(ecx & 1)) >> return emulate_gp(ctxt, 0); >> } >> return X86EMUL_CONTINUE; >> } Anyhow, if you want a real mwait emulation, you can write-protect the page of the monitored memory area in the EPT of the other VCPUs and set a callback once a write to the area takes place. You may want the host to cause a spurious wakeup after you do the write-protection, so you will not miss a write of another VCPU to the monitored area. After the spurious wake-up, the VM is likely to issue an additional mwait, using the same monitored cache-line. Additional care for DMAs (emulated and paravirtual) might be needed with the assistance of QEMU. The complicated case is dealing with the DMAs of assigned devices due to the lack of support for I/O page-faules. Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html