Re: [PATCH v3 1/2] KVM: LAPIC: Optimize timer latency consider world switch time

Wanpeng Li <kernellwp@xxxxxxxxx> · Thu, 13 Jun 2019 09:52:45 +0800

On Thu, 13 Jun 2019 at 03:27, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote:
>
> 2019-06-12 21:22+0200, Radim Krčmář:
> > 2019-06-12 08:14-0700, Sean Christopherson:
> > > On Wed, Jun 12, 2019 at 05:40:18PM +0800, Wanpeng Li wrote:
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > @@ -145,6 +145,12 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR);
> > > >  static int __read_mostly lapic_timer_advance_ns = -1;
> > > >  module_param(lapic_timer_advance_ns, int, S_IRUGO | S_IWUSR);
> > > >
> > > > +/*
> > > > + * lapic timer vmentry advance (tscdeadline mode only) in nanoseconds.
> > > > + */
> > > > +u32 __read_mostly vmentry_advance_ns = 300;
> > >
> > > Enabling this by default makes me nervous, e.g. nothing guarantees that
> > > future versions of KVM and/or CPUs will continue to have 300ns of overhead
> > > between wait_lapic_expire() and VM-Enter.
> > >
> > > If we want it enabled by default so that it gets tested, the default
> > > value should be extremely conservative, e.g. set the default to a small
> > > percentage (25%?) of the latency of VM-Enter itself on modern CPUs,
> > > VM-Enter latency being the min between VMLAUNCH and VMLOAD+VMRUN+VMSAVE.
> >
> > I share the sentiment.  We definitely must not enter the guest before
> > the deadline has expired and CPUs are approaching 5 GHz (in turbo), so
> > 300 ns would be too much even today.
> >
> > I wrote a simple testcase for rough timing and there are 267 cycles
> > (111 ns @ 2.4 GHz) between doing rdtsc() right after
> > kvm_wait_lapic_expire() [1] and doing rdtsc() in the guest as soon as
> > possible (see the attached kvm-unit-test).
>
> I forgot to attach it, pasting here as a patch for kvm-unit-tests.

Thanks for this, Radim. :)

Regards,
Wanpeng Li

>
> ---
> diff --git a/x86/Makefile.common b/x86/Makefile.common
> index e612dbe..ceed648 100644
> --- a/x86/Makefile.common
> +++ b/x86/Makefile.common
> @@ -58,7 +58,7 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \
>                 $(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat \
>                 $(TEST_DIR)/hyperv_synic.flat $(TEST_DIR)/hyperv_stimer.flat \
>                 $(TEST_DIR)/hyperv_connections.flat \
> -               $(TEST_DIR)/umip.flat
> +               $(TEST_DIR)/umip.flat $(TEST_DIR)/vmentry_latency.flat
>
>  ifdef API
>  tests-api = api/api-sample api/dirty-log api/dirty-log-perf
> diff --git a/x86/vmentry_latency.c b/x86/vmentry_latency.c
> new file mode 100644
> index 0000000..3859f09
> --- /dev/null
> +++ b/x86/vmentry_latency.c
> @@ -0,0 +1,45 @@
> +#include "x86/vm.h"
> +
> +static u64 get_last_hypervisor_tsc_delta(void)
> +{
> +       u64 a = 0, b, c, d;
> +       u64 tsc;
> +
> +       /*
> +        * The first vmcall is there to force a vm exit just before measuring.
> +        */
> +       asm volatile ("vmcall" : "+a"(a), "=b"(b), "=c"(c), "=d"(d));
> +
> +       tsc = rdtsc();
> +
> +       /*
> +        * The second hypercall recovers the value that was stored when vm
> +        * entering to execute the rdtsc()
> +        */
> +       a = 11;
> +       asm volatile ("vmcall" : "+a"(a), "=b"(b), "=c"(c), "=d"(d));
> +
> +       return tsc - a;
> +}
> +
> +static void vmentry_latency(void)
> +{
> +       unsigned i = 1000000;
> +       u64 min = -1;
> +
> +       while (i--) {
> +               u64 latency = get_last_hypervisor_tsc_delta();
> +               if (latency < min)
> +                       min = latency;
> +       }
> +
> +       printf("vm entry latency is %"PRIu64" TSC cycles\n", min);
> +}
> +
> +int main(void)
> +{
> +       setup_vm();
> +       vmentry_latency();
> +
> +       return 0;
> +}