On 07/09/2022 17:10, Alexei Starovoitov wrote: > On Wed, Sep 7, 2022 at 7:20 AM Quentin Monnet <quentin@xxxxxxxxxxxxx> wrote: >> >> On 07/09/2022 00:46, Alexei Starovoitov wrote: >>> On Tue, Sep 6, 2022 at 6:36 AM Quentin Monnet <quentin@xxxxxxxxxxxxx> wrote: >>>> >>>> Naturally, the display of disassembled instructions comes with a few >>>> minor differences. Here is a sample output with libbfd (already >>>> supported before this patch): >>>> >>>> # bpftool prog dump jited id 56 >>>> bpf_prog_6deef7357e7b4530: >>>> 0: nopl 0x0(%rax,%rax,1) >>>> 5: xchg %ax,%ax >>>> 7: push %rbp >>>> 8: mov %rsp,%rbp >>>> b: push %rbx >>>> c: push %r13 >>>> e: push %r14 >>>> 10: mov %rdi,%rbx >>>> 13: movzwq 0xb0(%rbx),%r13 >>>> 1b: xor %r14d,%r14d >>>> 1e: or $0x2,%r14d >>>> 22: mov $0x1,%eax >>>> 27: cmp $0x2,%r14 >>>> 2b: jne 0x000000000000002f >>>> 2d: xor %eax,%eax >>>> 2f: pop %r14 >>>> 31: pop %r13 >>>> 33: pop %rbx >>>> 34: leave >>>> 35: ret >>>> 36: int3 >>>> >>>> LLVM supports several variants that we could set when initialising the >>>> disassembler, for example with: >>>> >>>> LLVMSetDisasmOptions(*ctx, >>>> LLVMDisassembler_Option_AsmPrinterVariant); >>>> >>>> but the default printer is kept for now. Here is the output with LLVM: >>>> >>>> # bpftool prog dump jited id 56 >>>> bpf_prog_6deef7357e7b4530: >>>> 0: nopl (%rax,%rax) >>>> 5: nop >>>> 7: pushq %rbp >>>> 8: movq %rsp, %rbp >>>> b: pushq %rbx >>>> c: pushq %r13 >>>> e: pushq %r14 >>>> 10: movq %rdi, %rbx >>>> 13: movzwq 176(%rbx), %r13 >>>> 1b: xorl %r14d, %r14d >>>> 1e: orl $2, %r14d >>>> 22: movl $1, %eax >>>> 27: cmpq $2, %r14 >>>> 2b: jne 2 >>>> 2d: xorl %eax, %eax >>>> 2f: popq %r14 >>> >>> If I'm reading the asm correctly the difference is significant. >>> jne 0x2f was an absolute address and jmps were easy >>> to follow. >>> While in llvm disasm it's 'jne 2' ?! What is 2 ? >>> 2 bytes from the next insn of 0x2d ? >> >> Yes, that's it. Apparently, this is how the operand is encoded, and >> libbfd does the translation to the absolute address: >> >> # bpftool prog dump jited id 7868 opcodes >> [...] >> 2b: jne 0x000000000000002f >> 75 02 >> [...] >> >> The same difference is observable between objdump and llvm-objdump on an >> x86-64 binary for example, although they usually have labels to refer to >> ("jne -22 <_obstack_memory_used+0x7d0>"), making the navigation >> easier. The only mention I could find of that difference is a report >> from 2013 [0]. >> >> [0] https://discourse.llvm.org/t/llvm-objdump-disassembling-jmp/29584/2 >> >>> That is super hard to read. >>> Is there a way to tune/configure llvm disasm? >> >> There's a function and some options to tune it, but I tried them and >> none applies to converting the jump operands. >> >> int LLVMSetDisasmOptions(LLVMDisasmContextRef DC, uint64_t Options); >> >> /* The option to produce marked up assembly. */ >> #define LLVMDisassembler_Option_UseMarkup 1 >> /* The option to print immediates as hex. */ >> #define LLVMDisassembler_Option_PrintImmHex 2 >> /* The option use the other assembler printer variant */ >> #define LLVMDisassembler_Option_AsmPrinterVariant 4 >> /* The option to set comment on instructions */ >> #define LLVMDisassembler_Option_SetInstrComments 8 >> /* The option to print latency information alongside instructions */ >> #define LLVMDisassembler_Option_PrintLatency 16 >> >> I found that LLVMDisassembler_Option_AsmPrinterVariant read better, >> although in my patch I kept the default output which looked closer to >> the existing from libbfd. Here's what the option produces: >> >> bpf_prog_6deef7357e7b4530: >> 0: nop dword ptr [rax + rax] >> 5: nop >> 7: push rbp >> 8: mov rbp, rsp >> b: push rbx >> c: push r13 >> e: push r14 >> 10: mov rbx, rdi >> 13: movzx r13, word ptr [rbx + 180] >> 1b: xor r14d, r14d >> 1e: or r14d, 2 >> 22: mov eax, 1 >> 27: cmp r14, 2 >> 2b: jne 2 >> 2d: xor eax, eax >> 2f: pop r14 >> 31: pop r13 >> 33: pop rbx >> 34: leave >> 35: re >> >> But the jne operand remains a '2'. I'm not aware of any option to change >> it in LLVM's disassembler :(. > > Hmm. llvm-objdump -d test_maps > looks fine: > 41bfcb: e8 6f f7 ff ff callq 0x41b73f > <find_extern_btf_id> > > the must be something llvm disasm is missing when you feed raw bytes > into it. > Please keep investigating. In this form I'm afraid it's no go. OK, I'll keep looking