On Fri, Nov 7, 2014 at 9:59 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > On Thu, Nov 6, 2014 at 11:17 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: >> >> >> On 07/11/2014 07:27, Andy Lutomirski wrote: >>> Is there an easy benchmark that's sensitive to the time it takes to >>> round-trip from userspace to guest and back to userspace? I think I >>> may have a big speedup. >> >> The simplest is vmexit.flat from >> git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git >> >> Run it with "x86/run x86/vmexit.flat" and look at the inl_from_qemu >> benchmark. > > Thanks! > > That test case is slower than I expected. I think my change is likely > to save somewhat under 100ns, which is only a couple percent. I'll > look for more impressive improvements. > > On a barely related note, in the process of poking around with this > test, I noticed: > > /* On ept, can't emulate nx, and must switch nx atomically */ > if (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX)) { > guest_efer = vmx->vcpu.arch.efer; > if (!(guest_efer & EFER_LMA)) > guest_efer &= ~EFER_LME; > add_atomic_switch_msr(vmx, MSR_EFER, guest_efer, host_efer); > return false; > } > > return true; > > This heuristic seems wrong to me. wrmsr is serializing and therefore > extremely slow, whereas I imagine that, on CPUs that support it, > atomically switching EFER ought to be reasonably fast. > > Indeed, changing vmexit.c to disable NX (thereby forcing atomic EFER > switching, and having no other relevant effect that I've thought of) > speeds up inl_from_qemu by ~30% on Sandy Bridge. Would it make sense > to always use atomic EFER switching, at least when > cpu_has_load_ia32_efer? > Digging in to the history suggests that I might be right. There's this: commit 110312c84b5fbd4daf5de2417fa8ab5ec883858d Author: Avi Kivity <avi@xxxxxxxxxx> Date: Tue Dec 21 12:54:20 2010 +0200 KVM: VMX: Optimize atomic EFER load When NX is enabled on the host but not on the guest, we use the entry/exit msr load facility, which is slow. Optimize it to use entry/exit efer load, which is ~1200 cycles faster. Signed-off-by: Avi Kivity <avi@xxxxxxxxxx> Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> The NX and atomic EFER heuristic seems to be considerably older than that. It could just be that no one ever noticed entry/exit efer load becoming faster than wrmsr on modern hardware. Someone should double-check that I'm not nuts here, though. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html