On 07/09/2022 00:46, Alexei Starovoitov wrote: > On Tue, Sep 6, 2022 at 6:36 AM Quentin Monnet <quentin@xxxxxxxxxxxxx> wrote: >> >> Naturally, the display of disassembled instructions comes with a few >> minor differences. Here is a sample output with libbfd (already >> supported before this patch): >> >> # bpftool prog dump jited id 56 >> bpf_prog_6deef7357e7b4530: >> 0: nopl 0x0(%rax,%rax,1) >> 5: xchg %ax,%ax >> 7: push %rbp >> 8: mov %rsp,%rbp >> b: push %rbx >> c: push %r13 >> e: push %r14 >> 10: mov %rdi,%rbx >> 13: movzwq 0xb0(%rbx),%r13 >> 1b: xor %r14d,%r14d >> 1e: or $0x2,%r14d >> 22: mov $0x1,%eax >> 27: cmp $0x2,%r14 >> 2b: jne 0x000000000000002f >> 2d: xor %eax,%eax >> 2f: pop %r14 >> 31: pop %r13 >> 33: pop %rbx >> 34: leave >> 35: ret >> 36: int3 >> >> LLVM supports several variants that we could set when initialising the >> disassembler, for example with: >> >> LLVMSetDisasmOptions(*ctx, >> LLVMDisassembler_Option_AsmPrinterVariant); >> >> but the default printer is kept for now. Here is the output with LLVM: >> >> # bpftool prog dump jited id 56 >> bpf_prog_6deef7357e7b4530: >> 0: nopl (%rax,%rax) >> 5: nop >> 7: pushq %rbp >> 8: movq %rsp, %rbp >> b: pushq %rbx >> c: pushq %r13 >> e: pushq %r14 >> 10: movq %rdi, %rbx >> 13: movzwq 176(%rbx), %r13 >> 1b: xorl %r14d, %r14d >> 1e: orl $2, %r14d >> 22: movl $1, %eax >> 27: cmpq $2, %r14 >> 2b: jne 2 >> 2d: xorl %eax, %eax >> 2f: popq %r14 > > If I'm reading the asm correctly the difference is significant. > jne 0x2f was an absolute address and jmps were easy > to follow. > While in llvm disasm it's 'jne 2' ?! What is 2 ? > 2 bytes from the next insn of 0x2d ? Yes, that's it. Apparently, this is how the operand is encoded, and libbfd does the translation to the absolute address: # bpftool prog dump jited id 7868 opcodes [...] 2b: jne 0x000000000000002f 75 02 [...] The same difference is observable between objdump and llvm-objdump on an x86-64 binary for example, although they usually have labels to refer to ("jne -22 <_obstack_memory_used+0x7d0>"), making the navigation easier. The only mention I could find of that difference is a report from 2013 [0]. [0] https://discourse.llvm.org/t/llvm-objdump-disassembling-jmp/29584/2 > That is super hard to read. > Is there a way to tune/configure llvm disasm? There's a function and some options to tune it, but I tried them and none applies to converting the jump operands. int LLVMSetDisasmOptions(LLVMDisasmContextRef DC, uint64_t Options); /* The option to produce marked up assembly. */ #define LLVMDisassembler_Option_UseMarkup 1 /* The option to print immediates as hex. */ #define LLVMDisassembler_Option_PrintImmHex 2 /* The option use the other assembler printer variant */ #define LLVMDisassembler_Option_AsmPrinterVariant 4 /* The option to set comment on instructions */ #define LLVMDisassembler_Option_SetInstrComments 8 /* The option to print latency information alongside instructions */ #define LLVMDisassembler_Option_PrintLatency 16 I found that LLVMDisassembler_Option_AsmPrinterVariant read better, although in my patch I kept the default output which looked closer to the existing from libbfd. Here's what the option produces: bpf_prog_6deef7357e7b4530: 0: nop dword ptr [rax + rax] 5: nop 7: push rbp 8: mov rbp, rsp b: push rbx c: push r13 e: push r14 10: mov rbx, rdi 13: movzx r13, word ptr [rbx + 180] 1b: xor r14d, r14d 1e: or r14d, 2 22: mov eax, 1 27: cmp r14, 2 2b: jne 2 2d: xor eax, eax 2f: pop r14 31: pop r13 33: pop rbx 34: leave 35: re But the jne operand remains a '2'. I'm not aware of any option to change it in LLVM's disassembler :(.