Łukasz Lew wrote:
Can you send me the log from my benchmark?
And your processor model?
64 bit compiled with gcc 4.4 is a little faster thanwith 4.1.2
Logs and proc/cpuinfo emailed just to you. I don't think most on
gcc-help want to see all that.
If you can do the same for g++4.3, that would be very useful for me.
I don't have 4.3 installed here. Maybe I'll get a chance elsewhere.
Is it possible to get mixed view?
In Opannotate, specify both --source and --assembly and it gives you mixed.
Mixed is really ugly. Why run Oprofile at all if you're not compiling
with optimization, but how can you expect mixed assembly and source view
to make any sense after optimization.
The sane view, so far as I can tell, isn't available. Maybe I'll figure
out how to add it. It should be an assembly view with an extra column
on each line (probably after the stats and before the address) giving
the source line number.
Obviously Opannotate calls something that has a vague idea of the source
line for each asm line (or mixed mode wouldn't be possible). Obviously,
Opannotate isn't consistent in the way it uses that data or source view
wouldn't miss almost everything despite this being such a simple program.
For example, a chunk of the mixed mode output look like this:
: void load (const Board* save_board) {
: memcpy(this, save_board, sizeof(Board));
27 0.0054 : 400d33: mov $0x602980,%edi
2 4.0e-04 : 400d38: mov 0x20(%rsp),%rsi
: 400d3d: mov $0x199,%ecx
16639 3.3551 : 400d42: rep movsq %ds:(%rsi),%es:(%rdi)
328 0.0661 : 400d45: jmpq 400f60
<_ZN24simple_playout_benchmark3runEPK5Boardj+0x300>
: 400d4a: nopw 0x0(%rax,%rax,1)
But in source view, the load method and its memcpy line are shown with
zero execution time, no source line has a value near as large as 16639
and the total for all source lines is a tiny fraction of the correct total.
So Opannotate CAN associate addresses 400d33 through 400d4a with your
source line containing the memcpy. Unlike most of the rest of what I
see in mixed view, that association is even correct. But in source view
it is still discarded.
Maybe source view intentionally discards anything inlined. But what a
stupid thing to do.
I hope to find time to dig into the opannotate source code and figure
some of this out.
Can you be more specific?
How do you know which part was inlined where?
For example, in mixed mode I see a couple lines of asm (no execution
time) identified as being your source line
rep (ii, playout_cnt) {
followed directly by the "load" routine I quoted above, and with no
return at the end of load. Since the asm code at load is obviously
correct and your source code calls load right after that rep thing, it
is pretty obvious load was inlined at that point.
but do you observe the 10% difference in performance that I have on my machine?
No.
PS
Is there any alternative for OProfile?
If not, then why it is so undeveloped?
I sure would like to know.
More intrusive methods of profiling really don't fit the situations
where I want profiling. The raw sampling with minimal disruption that
vtune can do in windows or oprofile in Linux, is exactly what I need and
both those tools seem to be able to capture the data I want captured and
both those tools seem to present the captured data through such a
horrible combination of bugs and bad design as to make the results
nearly useless. (Then there is vtune for Linux, which I've also tried
but never found any way to get any useful output at all).
So if you find something better, please tell me.