John Fine a écrit :
I was curious, so I tried running your benchmark. It was too fast for
meaningful results, so I increased the counts int the calls to
simple_playout_benchmark::run and I noticed some negative and
generally unstable values for "clock cycles per playout".
So your code:
uint64 get_cc_time () volatile {
uint64 ret;
__asm__ __volatile__("rdtsc" : "=A" (ret) : :);
return ret;
}
gives me values that aren't even monotonic.
According to http://en.wikipedia.org/wiki/RDTSC
"With the advent of multi-core/hyperthreaded CPUs, systems with multiple
CPUs, and "hibernating" operating systems
<http://en.wikipedia.org/wiki/Operating_system>, the TSC cannot be
relied on to provide accurate results. The issue has two components:
rate of tick and whether all cores (processors) have identical values in
their time-keeping registers. There is no promise that the timestamp
counters of multiple CPUs on a single motherboard will be synchronized.
In such cases, programmers can only get reliable results by locking
their code to a single CPU. Even then, the CPU speed may change due to
power-saving measures taken by the OS or BIOS
<http://en.wikipedia.org/wiki/BIOS>, or the system may be hibernated and
later resumed (resetting the time stamp counter). Reliance on the time
stamp counter also reduces portability, as other processors my not have
a similar feature. Recent Intel processors include a constant rate TSC
(identified by the constant_tsc flag in Linux's /proc/cpuinfo). With
these processors the TSC reads at the processors maximum rate regardless
of the actual CPU running rate. While this makes time keeping more
consistent, it can skew benchmarks, where a certain amount of spin-up
time is spent at a lower clock rate before the OS switches the processor
to the higher rate. This has the effect of making things seem like they
require more processor cycles than they normally would."