Re: Performance problem

"Łukasz Lew" <lukasz.lew@xxxxxxxxx> · Mon, 22 Sep 2008 01:59:16 +0200

Indeed I never tested my asm code on 64 bit I knew it was buggy but Iforgot about it.I will try to fix it now.
You may be very right about the register allocation.I tuned my code on 4.2 and small "irrelevant" changes changed theperfomance badlyand asm output revealed among other things different registers.
Is there any way to controll register allocation just asallways_inline controls inlining?
I use Oprofile a lot, and tried to pinpoint the difference but asmoutput is too differentwhile c++ annotation  is too weak because of heavy inlining.Lukasz
On Mon, Sep 22, 2008 at 01:39, John Fine <johnsfine@xxxxxxxxxxx> wrote:> I was curious, so I tried running your benchmark.  It was too fast for> meaningful results, so I increased the counts int the calls to> simple_playout_benchmark::run and I noticed some negative and generally> unstable values for "clock cycles per playout".>> So your code:>>  uint64 get_cc_time () volatile {>   uint64 ret;>   __asm__ __volatile__("rdtsc" : "=A" (ret) : :);>   return ret;>  }>> gives me values that aren't even monotonic.>> I'm on a 64-bit dual core AMD system.  My best guess is that the program> switches cores part way through the loop. But I really don't know enough> about either rdtsc or __asm__ __volatile__ to know whether there might be> other reasons.>> Are you running on a single core system?  Or otherwise controlling for such> effects?>> In other projects, I've found that Oprofile is very effective in tracking> down the direct cause of performance differences.  Have you tried that?  In> much of what I do, the direct cause of a performance difference is just a> hint at the indirect true cause.  But in an example as simple as you've> provided, the direct cause is the cause.>> Are you building for 32-bit or 64-bit?>> In 32-bit, gcc is really bad at dealing with the architecture's shortage of> registers.  A tiny change anywhere can change gcc's register choices leading> into the critical loop and either cause or avoid a register spill.  That> alone could cause a 10% difference.>>> Łukasz Lew wrote:>>>> I extracted only the benchmark part:>> http://www.mimuw.edu.pl/~lew/libego_benchmark.tgz>>>>>>>