Re: Performance problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Łukasz Lew wrote:
I fixed the problem (I think) with rdtsc on 64bit architectures.
http://www.mimuw.edu.pl/~lew/libego_benchmark.tgz
Seems to work. Why was it previously correct for 32 bit? Did the 32 bit compiler already combine the correct two registers?
You may be very right about the register allocation.
I tuned my code on 4.2 and small "irrelevant" changes changed the
perfomance badly
and asm output revealed among other things different registers.
That doesn't really prove much. Without some very good output from Opannotate, I don't know how to tell the real reason for the performance difference.


I use Oprofile a lot, and tried to pinpoint the difference but asm
output is too different
while c++ annotation  is too weak because of heavy inlining.
I'm trying to understand and/or fix the use of Opannotate for some much harder problems, so I was curious enough to try it on your program. I compiled your program x86_64 with gcc 4.4. Even if I got good results, that wouldn't tell you anything about 32 bit gcc 4.3.

But I got surprisingly bad results. I haven't previously seen such bad results from opannotate without using heavily templated code. But I also haven't used a gcc 4.4 compiled program with opannotate before.

In --source mode nearly all the total time was missing (not associated with any source line). In mixed source and assembly view, I think all the time was shown, but I don't think the assembly code corresponded very accurately with the source code and the time was in some very surprising lumps. I usually can interpret such lumps (usually the instruction after an L2 cache miss or the instruction after a mispredicted branch). But that didn't seem to fit the execution time lumps in your code.

The few points in your source code that had most of the total execution time were inlined multiple times with different register usage each time. No one inline copy of any such routine had as much as 4% of the total execution time. That tends to wreck the theory that a minor change somewhere has caused a big difference by changing register allocation. There wouldn't be that sort of correlation in the way it changes register allocation across a bunch of different inlinings of the same function that already differ from each other in register allocation.


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux