On Wed, Sep 7, 2022 at 7:20 AM Quentin Monnet <quentin@xxxxxxxxxxxxx> wrote: > > On 07/09/2022 00:46, Alexei Starovoitov wrote: > > On Tue, Sep 6, 2022 at 6:36 AM Quentin Monnet <quentin@xxxxxxxxxxxxx> wrote: > >> > >> Naturally, the display of disassembled instructions comes with a few > >> minor differences. Here is a sample output with libbfd (already > >> supported before this patch): > >> > >> # bpftool prog dump jited id 56 > >> bpf_prog_6deef7357e7b4530: > >> 0: nopl 0x0(%rax,%rax,1) > >> 5: xchg %ax,%ax > >> 7: push %rbp > >> 8: mov %rsp,%rbp > >> b: push %rbx > >> c: push %r13 > >> e: push %r14 > >> 10: mov %rdi,%rbx > >> 13: movzwq 0xb0(%rbx),%r13 > >> 1b: xor %r14d,%r14d > >> 1e: or $0x2,%r14d > >> 22: mov $0x1,%eax > >> 27: cmp $0x2,%r14 > >> 2b: jne 0x000000000000002f > >> 2d: xor %eax,%eax > >> 2f: pop %r14 > >> 31: pop %r13 > >> 33: pop %rbx > >> 34: leave > >> 35: ret > >> 36: int3 > >> > >> LLVM supports several variants that we could set when initialising the > >> disassembler, for example with: > >> > >> LLVMSetDisasmOptions(*ctx, > >> LLVMDisassembler_Option_AsmPrinterVariant); > >> > >> but the default printer is kept for now. Here is the output with LLVM: > >> > >> # bpftool prog dump jited id 56 > >> bpf_prog_6deef7357e7b4530: > >> 0: nopl (%rax,%rax) > >> 5: nop > >> 7: pushq %rbp > >> 8: movq %rsp, %rbp > >> b: pushq %rbx > >> c: pushq %r13 > >> e: pushq %r14 > >> 10: movq %rdi, %rbx > >> 13: movzwq 176(%rbx), %r13 > >> 1b: xorl %r14d, %r14d > >> 1e: orl $2, %r14d > >> 22: movl $1, %eax > >> 27: cmpq $2, %r14 > >> 2b: jne 2 > >> 2d: xorl %eax, %eax > >> 2f: popq %r14 > > > > If I'm reading the asm correctly the difference is significant. > > jne 0x2f was an absolute address and jmps were easy > > to follow. > > While in llvm disasm it's 'jne 2' ?! What is 2 ? > > 2 bytes from the next insn of 0x2d ? > > Yes, that's it. Apparently, this is how the operand is encoded, and > libbfd does the translation to the absolute address: > > # bpftool prog dump jited id 7868 opcodes > [...] > 2b: jne 0x000000000000002f > 75 02 > [...] > > The same difference is observable between objdump and llvm-objdump on an > x86-64 binary for example, although they usually have labels to refer to > ("jne -22 <_obstack_memory_used+0x7d0>"), making the navigation > easier. The only mention I could find of that difference is a report > from 2013 [0]. > > [0] https://discourse.llvm.org/t/llvm-objdump-disassembling-jmp/29584/2 > > > That is super hard to read. > > Is there a way to tune/configure llvm disasm? > > There's a function and some options to tune it, but I tried them and > none applies to converting the jump operands. > > int LLVMSetDisasmOptions(LLVMDisasmContextRef DC, uint64_t Options); > > /* The option to produce marked up assembly. */ > #define LLVMDisassembler_Option_UseMarkup 1 > /* The option to print immediates as hex. */ > #define LLVMDisassembler_Option_PrintImmHex 2 > /* The option use the other assembler printer variant */ > #define LLVMDisassembler_Option_AsmPrinterVariant 4 > /* The option to set comment on instructions */ > #define LLVMDisassembler_Option_SetInstrComments 8 > /* The option to print latency information alongside instructions */ > #define LLVMDisassembler_Option_PrintLatency 16 > > I found that LLVMDisassembler_Option_AsmPrinterVariant read better, > although in my patch I kept the default output which looked closer to > the existing from libbfd. Here's what the option produces: > > bpf_prog_6deef7357e7b4530: > 0: nop dword ptr [rax + rax] > 5: nop > 7: push rbp > 8: mov rbp, rsp > b: push rbx > c: push r13 > e: push r14 > 10: mov rbx, rdi > 13: movzx r13, word ptr [rbx + 180] > 1b: xor r14d, r14d > 1e: or r14d, 2 > 22: mov eax, 1 > 27: cmp r14, 2 > 2b: jne 2 > 2d: xor eax, eax > 2f: pop r14 > 31: pop r13 > 33: pop rbx > 34: leave > 35: re > > But the jne operand remains a '2'. I'm not aware of any option to change > it in LLVM's disassembler :(. Hmm. llvm-objdump -d test_maps looks fine: 41bfcb: e8 6f f7 ff ff callq 0x41b73f <find_extern_btf_id> the must be something llvm disasm is missing when you feed raw bytes into it. Please keep investigating. In this form I'm afraid it's no go.