Re: [kvm-unit-tests PATCH] x86: Add RDTSC test

Liran Alon <liran.alon@xxxxxxxxxx> · Tue, 26 Nov 2019 01:22:09 +0200

> On 26 Nov 2019, at 0:44, Aaron Lewis <aaronlewis@xxxxxxxxxx> wrote:
> 
> Verify that the difference between an L2 RDTSC instruction and the
> IA32_TIME_STAMP_COUNTER MSR value stored in the VMCS12's VM-exit
> MSR-store list is less than 750 cycles, 99.9% of the time.
> 
> Signed-off-by: Aaron Lewis <aaronlewis@xxxxxxxxxx>
> Reviewed-by: Jim Mattson <jmattson@xxxxxxxxxx>
> ---
> x86/unittests.cfg |  6 ++++
> x86/vmx_tests.c   | 89 +++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 95 insertions(+)
> 
> diff --git a/x86/unittests.cfg b/x86/unittests.cfg
> index b4865ac..5291d96 100644
> --- a/x86/unittests.cfg
> +++ b/x86/unittests.cfg
> @@ -284,6 +284,12 @@ extra_params = -cpu host,+vmx -append vmx_vmcs_shadow_test
> arch = x86_64
> groups = vmx
> 
> +[vmx_rdtsc_vmexit_diff_test]
> +file = vmx.flat
> +extra_params = -cpu host,+vmx -append rdtsc_vmexit_diff_test
> +arch = x86_64
> +groups = vmx
> +
> [debug]
> file = debug.flat
> arch = x86_64
> diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
> index 1d8932f..f42ae2c 100644
> --- a/x86/vmx_tests.c
> +++ b/x86/vmx_tests.c
> @@ -8790,7 +8790,94 @@ static void vmx_vmcs_shadow_test(void)
> 	enter_guest();
> }
> 
> +/*
> + * This test monitors the difference between an L2 RDTSC instruction
> + * and the IA32_TIME_STAMP_COUNTER MSR value stored in the VMCS12
> + * VM-exit MSR-store list when taking a VM-exit on the instruction
> + * following RDTSC.
> + */
> +#define RDTSC_DIFF_ITERS 100000
> +#define RDTSC_DIFF_FAILS 100
> +#define L1_RDTSC_LIMIT 750

General note: I personally dislike the use of terms L1 & L2 in kvm-unit-tests.
I prefer to use host vs. guest OR vmx root mode vs. non-root mode.

Especially considering that kvm-unit-tests have de-facto became cpu-unit-tests as it can run on top of any CPU implementation.
Either vCPU on top of some hypervisor (KVM being one of them) or a BareMetal CPU (Like Nadav Amit runs to verify tests correctness :P).

> +
> +/*
> + * Set 'use TSC offsetting' and set the L2 offset to the
> + * inverse of L1's current TSC value, so that L2 starts running
> + * with an effective TSC value of 0.
> + */
> +static void reset_l2_tsc_to_zero(void)
> +{
> +	TEST_ASSERT_MSG(ctrl_cpu_rev[0].clr & CPU_USE_TSC_OFFSET,
> +			"Expected support for 'use TSC offsetting'");
> +
> +	vmcs_set_bits(CPU_EXEC_CTRL0, CPU_USE_TSC_OFFSET);
> +	vmcs_write(TSC_OFFSET, -rdtsc());
> +}
> +
> +static void rdtsc_vmexit_diff_test_guest(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < RDTSC_DIFF_ITERS; i++)
> +		asm volatile("rdtsc; vmcall" : : : "eax", "edx”);

I would add a comment here on why you use inline asm inside of just { l2_rdtsc = rdtsc(); vmcall(); }.
(Because of the extra cycles wasted on “ORing” RDX:RAX and saving result to some global before vmcall).

> +}
> +
> +/*
> + * This function only considers the "use TSC offsetting" VM-execution
> + * control.  It does not handle "use TSC scaling" (because the latter
> + * isn't available to L1 today.)

Because function correctness assume the latter, consider adding a runtime assert() on it?

> + */
> +static unsigned long long l1_time_to_l2_time(unsigned long long t)
> +{
> +	if (vmcs_read(CPU_EXEC_CTRL0) & CPU_USE_TSC_OFFSET)
> +		t += vmcs_read(TSC_OFFSET);
> +
> +	return t;
> +}
> +
> +static unsigned long long get_tsc_diff(void)

I think get_tsc_diff() is a bit of too generic name. May cause confusion.
I would consider renaming to rdtsc_vmexit_diff_test_iteration() or just put logic inline test itself.

> +{
> +	unsigned long long l2_tsc, l1_to_l2_tsc;
> +
> +	enter_guest();
> +	skip_exit_vmcall();
> +	l2_tsc = (u32) regs.rax + (regs.rdx << 32);
> +	l1_to_l2_tsc = l1_time_to_l2_time(exit_msr_store[0].value);
> +
> +	return l1_to_l2_tsc - l2_tsc;
> +}
> +
> +static void rdtsc_vmexit_diff_test(void)
> +{
> +	int fail = 0;
> +	int i;
> +
> +	test_set_guest(rdtsc_vmexit_diff_test_guest);
> +
> +	reset_l2_tsc_to_zero();
> 
> +	/*
> +	 * Set up the VMCS12 VM-exit MSR-store list to store just one
> +	 * MSR: IA32_TIME_STAMP_COUNTER. Note that the value stored is
> +	 * in the L1 time domain (i.e., it is not adjusted according
> +	 * to the TSC multiplier and TSC offset fields in the VMCS12,
> +	 * as an L2 RDTSC would be.)
> +	 */
> +	exit_msr_store = alloc_page();
> +	exit_msr_store[0].index = MSR_IA32_TSC;
> +	vmcs_write(EXI_MSR_ST_CNT, 1);
> +	vmcs_write(EXIT_MSR_ST_ADDR, virt_to_phys(exit_msr_store));
> +
> +	for (i = 0; i < RDTSC_DIFF_ITERS; i++) {
> +		if (get_tsc_diff() < L1_RDTSC_LIMIT)

Isn’t having a small diff between the value written to exit_msr_store[0].value to L2’s RDTSC result a good thing?
i.e. We wish that the MSR value captured by host will be very close to the guest RDTSC value on guest->host VMExit.
So shouldn’t the condition be (get_tsc_diff() >= L1_RDTSC_LIMIT)?

> +			fail++;
> +	}
> +
> +	enter_guest();
> +
> +	report("RDTSC to VM-exit delta too high in %d of %d iterations",
> +	       fail < RDTSC_DIFF_FAILS, fail, RDTSC_DIFF_ITERS);
> +}
> 
> static int invalid_msr_init(struct vmcs *vmcs)
> {
> @@ -9056,5 +9143,7 @@ struct vmx_test vmx_tests[] = {
> 	/* Atomic MSR switch tests. */
> 	TEST(atomic_switch_max_msrs_test),
> 	TEST(atomic_switch_overflow_msrs_test),
> +	/* Miscellaneous tests */

You can consider it de-facto part of “Atomic MSR switch tests.” and remove this comment.

> +	TEST(rdtsc_vmexit_diff_test),
> 	{ NULL, NULL, NULL, NULL, NULL, {0} },
> };
> -- 
> 2.24.0.432.g9d3f5f5b63-goog
>