On Tue, 21 Mar 2023 19:10:04 +0000, Colton Lewis <coltonlewis@xxxxxxxxxx> wrote: > > Marc Zyngier <maz@xxxxxxxxxx> writes: > > >> +#define MEASURE_CYCLES(x) \ > >> + ({ \ > >> + uint64_t start; \ > >> + start = cycles_read(); \ > >> + x; \ > > > You insert memory accesses inside a sequence that has no dependency > > with it. On a weakly ordered memory system, there is absolutely no > > reason why the memory access shouldn't be moved around. What do you > > exactly measure in that case? > > cycles_read is built on another function timer_get_cntct which includes > its own barriers. Stripped of some abstraction, the sequence is: > > timer_get_cntct (isb+read timer) > whatever is being measured > timer_get_cntct > > I hadn't looked at it too closely before but on review of the manual > I think you are correct. Borrowing from example D7-2 in the manual, it > should be: > > timer_get_cntct > isb > whatever is being measured > dsb > timer_get_cntct That's better, but also very heavy handed. You'd be better off constructing an address dependency from the timer value, and feed that into a load-acquire/store-release pair wrapping your payload. > > >> + cycles_read() - start; \ > > > I also question the usefulness of this exercise. You're comparing the > > time it takes for a multi-GHz system to put a write in a store buffer > > (assuming it didn't miss in the TLBs) vs a counter that gets updated > > at a frequency of a few tens of MHz. > > > My guts feeling is that this results in a big fat zero most of the > > time, but I'm happy to be explained otherwise. > > > In context, I'm trying to measure the time it takes to write to a buffer > *with dirty memory logging enabled*. What do you mean by zero? I can > confirm from running this code I am not measuring zero time. See my earlier point: the counter tick is a few MHz, and the CPU multiple GHz. So unless "whatever" is something that takes a significant time (several thousands of CPU cycles), you'll measure nothing using the counter. Page faults will probably show, but not a normal access. The right tool for this job is to use PMU events, as they count at the CPU frequency. Thanks, M. -- Without deviation from the norm, progress is not possible.