On Thu, 13 Jun 2019 at 03:27, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote: > > 2019-06-12 21:22+0200, Radim Krčmář: > > 2019-06-12 08:14-0700, Sean Christopherson: > > > On Wed, Jun 12, 2019 at 05:40:18PM +0800, Wanpeng Li wrote: > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > > @@ -145,6 +145,12 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR); > > > > static int __read_mostly lapic_timer_advance_ns = -1; > > > > module_param(lapic_timer_advance_ns, int, S_IRUGO | S_IWUSR); > > > > > > > > +/* > > > > + * lapic timer vmentry advance (tscdeadline mode only) in nanoseconds. > > > > + */ > > > > +u32 __read_mostly vmentry_advance_ns = 300; > > > > > > Enabling this by default makes me nervous, e.g. nothing guarantees that > > > future versions of KVM and/or CPUs will continue to have 300ns of overhead > > > between wait_lapic_expire() and VM-Enter. > > > > > > If we want it enabled by default so that it gets tested, the default > > > value should be extremely conservative, e.g. set the default to a small > > > percentage (25%?) of the latency of VM-Enter itself on modern CPUs, > > > VM-Enter latency being the min between VMLAUNCH and VMLOAD+VMRUN+VMSAVE. > > > > I share the sentiment. We definitely must not enter the guest before > > the deadline has expired and CPUs are approaching 5 GHz (in turbo), so > > 300 ns would be too much even today. > > > > I wrote a simple testcase for rough timing and there are 267 cycles > > (111 ns @ 2.4 GHz) between doing rdtsc() right after > > kvm_wait_lapic_expire() [1] and doing rdtsc() in the guest as soon as > > possible (see the attached kvm-unit-test). > > I forgot to attach it, pasting here as a patch for kvm-unit-tests. Thanks for this, Radim. :) Regards, Wanpeng Li > > --- > diff --git a/x86/Makefile.common b/x86/Makefile.common > index e612dbe..ceed648 100644 > --- a/x86/Makefile.common > +++ b/x86/Makefile.common > @@ -58,7 +58,7 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \ > $(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat \ > $(TEST_DIR)/hyperv_synic.flat $(TEST_DIR)/hyperv_stimer.flat \ > $(TEST_DIR)/hyperv_connections.flat \ > - $(TEST_DIR)/umip.flat > + $(TEST_DIR)/umip.flat $(TEST_DIR)/vmentry_latency.flat > > ifdef API > tests-api = api/api-sample api/dirty-log api/dirty-log-perf > diff --git a/x86/vmentry_latency.c b/x86/vmentry_latency.c > new file mode 100644 > index 0000000..3859f09 > --- /dev/null > +++ b/x86/vmentry_latency.c > @@ -0,0 +1,45 @@ > +#include "x86/vm.h" > + > +static u64 get_last_hypervisor_tsc_delta(void) > +{ > + u64 a = 0, b, c, d; > + u64 tsc; > + > + /* > + * The first vmcall is there to force a vm exit just before measuring. > + */ > + asm volatile ("vmcall" : "+a"(a), "=b"(b), "=c"(c), "=d"(d)); > + > + tsc = rdtsc(); > + > + /* > + * The second hypercall recovers the value that was stored when vm > + * entering to execute the rdtsc() > + */ > + a = 11; > + asm volatile ("vmcall" : "+a"(a), "=b"(b), "=c"(c), "=d"(d)); > + > + return tsc - a; > +} > + > +static void vmentry_latency(void) > +{ > + unsigned i = 1000000; > + u64 min = -1; > + > + while (i--) { > + u64 latency = get_last_hypervisor_tsc_delta(); > + if (latency < min) > + min = latency; > + } > + > + printf("vm entry latency is %"PRIu64" TSC cycles\n", min); > +} > + > +int main(void) > +{ > + setup_vm(); > + vmentry_latency(); > + > + return 0; > +}