Re: Performance problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I was curious, so I tried running your benchmark. It was too fast for meaningful results, so I increased the counts int the calls to simple_playout_benchmark::run and I noticed some negative and generally unstable values for "clock cycles per playout".

So your code:

 uint64 get_cc_time () volatile {
   uint64 ret;
   __asm__ __volatile__("rdtsc" : "=A" (ret) : :);
   return ret;
 }

gives me values that aren't even monotonic.

I'm on a 64-bit dual core AMD system. My best guess is that the program switches cores part way through the loop. But I really don't know enough about either rdtsc or __asm__ __volatile__ to know whether there might be other reasons.

Are you running on a single core system? Or otherwise controlling for such effects?

In other projects, I've found that Oprofile is very effective in tracking down the direct cause of performance differences. Have you tried that? In much of what I do, the direct cause of a performance difference is just a hint at the indirect true cause. But in an example as simple as you've provided, the direct cause is the cause.

Are you building for 32-bit or 64-bit?

In 32-bit, gcc is really bad at dealing with the architecture's shortage of registers. A tiny change anywhere can change gcc's register choices leading into the critical loop and either cause or avoid a register spill. That alone could cause a 10% difference.


Łukasz Lew wrote:
I extracted only the benchmark part:
http://www.mimuw.edu.pl/~lew/libego_benchmark.tgz



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux