Re: Seeking a KVM benchmark

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Fri, 7 Nov 2014 10:11:50 -0800

On Fri, Nov 7, 2014 at 9:59 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Thu, Nov 6, 2014 at 11:17 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>>
>>
>> On 07/11/2014 07:27, Andy Lutomirski wrote:
>>> Is there an easy benchmark that's sensitive to the time it takes to
>>> round-trip from userspace to guest and back to userspace?  I think I
>>> may have a big speedup.
>>
>> The simplest is vmexit.flat from
>> git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git
>>
>> Run it with "x86/run x86/vmexit.flat" and look at the inl_from_qemu
>> benchmark.
>
> Thanks!
>
> That test case is slower than I expected.  I think my change is likely
> to save somewhat under 100ns, which is only a couple percent.  I'll
> look for more impressive improvements.
>
> On a barely related note, in the process of poking around with this
> test, I noticed:
>
>     /* On ept, can't emulate nx, and must switch nx atomically */
>     if (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX)) {
>         guest_efer = vmx->vcpu.arch.efer;
>         if (!(guest_efer & EFER_LMA))
>             guest_efer &= ~EFER_LME;
>         add_atomic_switch_msr(vmx, MSR_EFER, guest_efer, host_efer);
>         return false;
>     }
>
>     return true;
>
> This heuristic seems wrong to me.  wrmsr is serializing and therefore
> extremely slow, whereas I imagine that, on CPUs that support it,
> atomically switching EFER ought to be reasonably fast.
>
> Indeed, changing vmexit.c to disable NX (thereby forcing atomic EFER
> switching, and having no other relevant effect that I've thought of)
> speeds up inl_from_qemu by ~30% on Sandy Bridge.  Would it make sense
> to always use atomic EFER switching, at least when
> cpu_has_load_ia32_efer?
>

Digging in to the history suggests that I might be right.

There's this:

commit 110312c84b5fbd4daf5de2417fa8ab5ec883858d
Author: Avi Kivity <avi@xxxxxxxxxx>
Date:   Tue Dec 21 12:54:20 2010 +0200

    KVM: VMX: Optimize atomic EFER load

    When NX is enabled on the host but not on the guest, we use the entry/exit
    msr load facility, which is slow.  Optimize it to use entry/exit efer load,
    which is ~1200 cycles faster.

    Signed-off-by: Avi Kivity <avi@xxxxxxxxxx>
    Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>

The NX and atomic EFER heuristic seems to be considerably older than
that.  It could just be that no one ever noticed entry/exit efer load
becoming faster than wrmsr on modern hardware.  Someone should
double-check that I'm not nuts here, though.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html