Re: Performance problem

Christophe Meessen <meessen@xxxxxxxxxxxxx> · Mon, 22 Sep 2008 07:57:01 +0200

John Fine a écrit :
I was curious, so I tried running your benchmark.  It was too fast for 
meaningful results, so I increased the counts int the calls to 
simple_playout_benchmark::run and I noticed some negative and 
generally unstable values for "clock cycles per playout".

So your code:

 uint64 get_cc_time () volatile {
   uint64 ret;
   __asm__ __volatile__("rdtsc" : "=A" (ret) : :);
   return ret;
 }

gives me values that aren't even monotonic.
According to http://en.wikipedia.org/wiki/RDTSC

"With the advent of multi-core/hyperthreaded CPUs, systems with multiple 
CPUs, and "hibernating" operating systems 
<http://en.wikipedia.org/wiki/Operating_system>, the TSC cannot be 
relied on to provide accurate results. The issue has two components: 
rate of tick and whether all cores (processors) have identical values in 
their time-keeping registers. There is no promise that the timestamp 
counters of multiple CPUs on a single motherboard will be synchronized. 
In such cases, programmers can only get reliable results by locking 
their code to a single CPU. Even then, the CPU speed may change due to 
power-saving measures taken by the OS or BIOS 
<http://en.wikipedia.org/wiki/BIOS>, or the system may be hibernated and 
later resumed (resetting the time stamp counter). Reliance on the time 
stamp counter also reduces portability, as other processors my not have 
a similar feature. Recent Intel processors include a constant rate TSC 
(identified by the constant_tsc flag in Linux's /proc/cpuinfo). With 
these processors the TSC reads at the processors maximum rate regardless 
of the actual CPU running rate. While this makes time keeping more 
consistent, it can skew benchmarks, where a certain amount of spin-up 
time is spent at a lower clock rate before the OS switches the processor 
to the higher rate. This has the effect of making things seem like they 
require more processor cycles than they normally would."