I fixed the problem (I think) with rdtsc on 64bit architectures.http://www.mimuw.edu.pl/~lew/libego_benchmark.tgz 2008/9/22 Łukasz Lew <lukasz.lew@xxxxxxxxx>:> Indeed I never tested my asm code on 64 bit I knew it was buggy but I> forgot about it.> I will try to fix it now.>> You may be very right about the register allocation.> I tuned my code on 4.2 and small "irrelevant" changes changed the> perfomance badly> and asm output revealed among other things different registers.>> Is there any way to controll register allocation just as> allways_inline controls inlining?>> I use Oprofile a lot, and tried to pinpoint the difference but asm> output is too different> while c++ annotation is too weak because of heavy inlining.> Lukasz>> On Mon, Sep 22, 2008 at 01:39, John Fine <johnsfine@xxxxxxxxxxx> wrote:>> I was curious, so I tried running your benchmark. It was too fast for>> meaningful results, so I increased the counts int the calls to>> simple_playout_benchmark::run and I noticed some negative and generally>> unstable values for "clock cycles per playout".>>>> So your code:>>>> uint64 get_cc_time () volatile {>> uint64 ret;>> __asm__ __volatile__("rdtsc" : "=A" (ret) : :);>> return ret;>> }>>>> gives me values that aren't even monotonic.>>>> I'm on a 64-bit dual core AMD system. My best guess is that the program>> switches cores part way through the loop. But I really don't know enough>> about either rdtsc or __asm__ __volatile__ to know whether there might be>> other reasons.>>>> Are you running on a single core system? Or otherwise controlling for such>> effects?>>>> In other projects, I've found that Oprofile is very effective in tracking>> down the direct cause of performance differences. Have you tried that? In>> much of what I do, the direct cause of a performance difference is just a>> hint at the indirect true cause. But in an example as simple as you've>> provided, the direct cause is the cause.>>>> Are you building for 32-bit or 64-bit?>>>> In 32-bit, gcc is really bad at dealing with the architecture's shortage of>> registers. A tiny change anywhere can change gcc's register choices leading>> into the critical loop and either cause or avoid a register spill. That>> alone could cause a 10% difference.>>>>>> Łukasz Lew wrote:>>>>>> I extracted only the benchmark part:>>> http://www.mimuw.edu.pl/~lew/libego_benchmark.tgz>>>>>>>>>>>>