Re: Wrong Perf Backtraces

ahmadkhorrami <ahmadkhorrami@xxxxxxxx> · Tue, 31 Mar 2020 19:35:55 +0430

And it seems that the bogus backtraces constitute only a small portion 

of the whole log. This seems to be good news.

On 2020-03-31 19:32, ahmadkhorrami wrote:

Hi Milian,

Thanks for the detailed answer. Well, the bug you mentioned is bad 

news. Because I sample using uppp. Perhaps this leads to these weird 

traces. Is this a purely software bug?

On 2020-03-31 19:14, Milian Wolff wrote:

On Dienstag, 31. März 2020 15:39:18 CEST ahmadkhorrami wrote:

But the addresses do not match. Do you confirm this as a bug in
libdwarf,...?

So I will ignore addresses without a matching symbol. But they do not
seem reliable!

Could you tell me the name of the library that generates the raw
addresses, so that I can try to debug it?

This is a platform specific question. There are multiple ways to unwind 

a

stack. If you are on x86 then by default the .eh_frame section is 

available

which holds the information necessary for unwinding. It doesn't depend 

on

debug symbols, that's only used for symbolization and inline-frame 

resolution

as Jiri indicated.

That said, in the context of perf, there are multiple scenarios that 

can lead

to broken unwinding:

a) perf record --call-graph dwarf: unwinding can overflow the stack 

copy

associated with every sample, so the upper end of the stack will be 

broken

b) perf record --call-graph $any: when you are sampling on a precise 

event,

such as cycles:P which is the default afaik, then on Intel with PEBS 

e.g. the

stack copy may be "wrong". See e.g. https://lkml.org/lkml/2018/11/6/257 

and

the overall thread. This is not solved yet afaik and after my initial 

attempt

at workarounding this issue I stopped looking into it and instead opted 

for

explicitly sampling on the non-precise events when I record call 

graphs... You

could try that too: do you see the issue when you run e.g.:

`perf record --call-graph dwarf -e cycles`

This should take the non-precise version for sampling but then at least 

the

call stacks are correct. I.e. you trade the accuracy of the instruction
pointer to which a sample points with reduced call stack breakage.

c) bugs :)