Marc Zyngier <maz@xxxxxxxxxx> writes:
+#define MEASURE_CYCLES(x) \ + ({ \ + uint64_t start; \ + start = cycles_read(); \ + x; \
You insert memory accesses inside a sequence that has no dependency with it. On a weakly ordered memory system, there is absolutely no reason why the memory access shouldn't be moved around. What do you exactly measure in that case?
cycles_read is built on another function timer_get_cntct which includes its own barriers. Stripped of some abstraction, the sequence is: timer_get_cntct (isb+read timer) whatever is being measured timer_get_cntct I hadn't looked at it too closely before but on review of the manual I think you are correct. Borrowing from example D7-2 in the manual, it should be: timer_get_cntct isb whatever is being measured dsb timer_get_cntct
+ cycles_read() - start; \
I also question the usefulness of this exercise. You're comparing the time it takes for a multi-GHz system to put a write in a store buffer (assuming it didn't miss in the TLBs) vs a counter that gets updated at a frequency of a few tens of MHz.
My guts feeling is that this results in a big fat zero most of the time, but I'm happy to be explained otherwise.
In context, I'm trying to measure the time it takes to write to a buffer *with dirty memory logging enabled*. What do you mean by zero? I can confirm from running this code I am not measuring zero time.
We already have all the required code to deal with ns conversions using a multiplier and a shift, avoiding floating point like the plague it is. Please reuse the kernel code for this, as you're quite likely to only measure the time it takes for KVM to trap the FP registers and perform a FP/SIMD switch...
Will do.