Some updates on these output issues. > > Song, I also noticed that source code is not being intermixed for the > --stdio annotation, while it works, to some degree, for '--tui', i.e. I am not seeing problem with the --stdio2 output, like: [root@kerneltest005.01.frc2 ~/bpf]# ~/perf annotate --stdio2 bpf_prog_9a7fd54e22aaf8eb_bpf_prog1 | head -n 30 Samples: 25 of event 'cycles', 4000 Hz, Event count (approx.): 14690094, [percent: local period] bpf_prog_9a7fd54e22aaf8eb_bpf_prog1() bpf_prog_9a7fd54e22aaf8eb_bpf_prog1 Percent int bpf_prog1(void *ctx) 16.04 push %rbp mov %rsp,%rbp sub $0x30,%rsp sub $0x28,%rbp 7.98 mov %rbx,0x0(%rbp) 4.03 mov %r13,0x8(%rbp) 4.03 mov %r14,0x10(%rbp) mov %r15,0x18(%rbp) xor %eax,%eax mov %rax,0x20(%rbp) 3.99 mov %rdi,%rbx xor %edi,%edi __u32 key = 0; mov %edi,-0x4(%rbp) 4.01 mov %rbp,%rsi int bpf_prog1(void *ctx) add $0xfffffffffffffffc,%rsi data = bpf_map_lookup_elem(&stackdata_map, &key); movabs $0xffff889fc4491600,%rdi → callq *ffffffffe0f4faf1 mov %rax,%r13 if (!data) cmp $0x0,%r13 → je 0 data->pid = bpf_get_current_pid_tgid(); Maybe Jiri's recent patches fixed it already? > when you do 'perf top', press '/bpf' to show just symbols with that > substring and then press enter or 'A' to annotate, we can see the > original C source code for the BPF program, but it is mangling the > screen sometimes, I need to try and fix, please take a look if you have > the time. Still need to look into the mangling issue. > > Also things like the callq targets need some work to tell what function > is that, which as I said isn't appearing on the --stdio2 output, but > appears on the --tui, i.e. we need to resolve that symbol to check how > to map back to a BPF helper or any othe callq target. Still need to look into resolving symbols. > > Also, what about those 'je 0', i.e. the target is being misinterpreted > or is this some BPF construct I should've know about? :) > > 2.68 0.00 0.00 0.00 mov %rdi,%rbx > → callq *ffffffffd359487f > mov %eax,-0x148(%rbp) > 9.61 0.00 0.00 0.00 mov %rbp,%rsi > add $0xfffffffffffffeb8,%rsi > movabs $0xffff9d556c776c00,%rdi > > → callq *ffffffffd3595b2f > cmp $0x0,%rax > → je 0 > 0.00 1.25 0.00 0.00 add $0x38,%rax > 0.80 0.21 0.00 0.00 xor %r13d,%r13d > cmp $0x0,%rax > → jne 0 > mov %rbp,%rdi > add $0xfffffffffffffeb8,%rdi > The 'je 0' issue is tricky. The magic happens in __annotation_line__write(). Because symbol__disassemble_bpf() takes some short cuts, it doesn't provide data identical to objdump. In this case, symbol__disassemble_bpf() generates something like je 0x000000000000017a which is the same as what we see from bpftool dump. Disassemble of kernel functions looks like jmp ffffffff8110a3ef <queued_spin_lock_slowpath+0x12f> __annotation_line__write() writes the first one as je 0 while writes the second one as ↑ jmp 12f Therefore, the problem is not from disassembler() call, but from the post processing of it. I still need time to figure out the best way to fix this. Any suggestions are highly appreciated. Thanks, Song