Oliver Upton <oliver.upton@xxxxxxxxx> writes:
On Wed, May 31, 2023 at 02:01:52PM -0700, Sean Christopherson wrote:
On Mon, Mar 27, 2023, Colton Lewis wrote:
> diff --git a/tools/testing/selftests/kvm/include/aarch64/processor.h
b/tools/testing/selftests/kvm/include/aarch64/processor.h
> index f65e491763e0..d441f485e9c6 100644
> --- a/tools/testing/selftests/kvm/include/aarch64/processor.h
> +++ b/tools/testing/selftests/kvm/include/aarch64/processor.h
> @@ -219,4 +219,14 @@ uint32_t guest_get_vcpuid(void);
> uint64_t cycles_read(void);
> uint64_t cycles_to_ns(struct kvm_vcpu *vcpu, uint64_t cycles);
>
> +#define MEASURE_CYCLES(x) \
> + ({ \
> + uint64_t start; \
> + start = cycles_read(); \
> + isb(); \
Would it make sense to put the necessary barriers inside the
cycles_read() (or
whatever we end up calling it)? Or does that not make sense on ARM?
+1. Additionally, the function should have a name that implies ordering,
like read_system_counter_ordered() or similar.
cycles_read() is currently a wrapper for timer_get_cntct(), which has
an isb() at the beginning but not the end. I think it would make more
sense to add the barrier there if there is no objection.
> + x; \
> + dsb(nsh); \
I assume you're doing this because you want to wait for outstanding
loads and stores to complete due to 'x', right?
Correct.
My knee-jerk reaction was that you could just do an mb() and share the
implementation between arches, but it would seem the tools/ flavor of
the barrier demotes to a DMB because... reasons.
Yep, and what I read from the ARM manual says it has to be DSB.
> + cycles_read() - start; \
> + })
> +
> #endif /* SELFTEST_KVM_PROCESSOR_H */
...
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h
b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 5d977f95d5f5..7352e02db4ee 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -1137,4 +1137,14 @@ void virt_map_level(struct kvm_vm *vm, uint64_t
vaddr, uint64_t paddr,
> uint64_t cycles_read(void);
> uint64_t cycles_to_ns(struct kvm_vcpu *vcpu, uint64_t cycles);
>
> +#define MEASURE_CYCLES(x) \
> + ({ \
> + uint64_t start; \
> + start = cycles_read(); \
> + asm volatile("mfence"); \
This is incorrect as placing the barrier after the RDTSC allows the
RDTSC to be
executed before earlier loads, e.g. could measure memory accesses from
whatever
was before MEASURE_CYCLES(). And per the kernel's rdtsc_ordered(), it
sounds like
RDTSC can only be hoisted before prior loads, i.e. will be ordered with
respect
to future loads and stores.
Interesting, so I will swap the fence and the cycles_read()
Same thing goes for the arm64 variant of the function... You want to
insert an isb() immediately _before_ you read the counter register to
avoid speculation.
That's taken care of. See my earlier comment about timer_get_cntct()