On 07/09/2022 00:46, Alexei Starovoitov wrote:
On Tue, Sep 6, 2022 at 6:36 AM Quentin Monnet <quentin@xxxxxxxxxxxxx> wrote:
Naturally, the display of disassembled instructions comes with a few
minor differences. Here is a sample output with libbfd (already
supported before this patch):
# bpftool prog dump jited id 56
bpf_prog_6deef7357e7b4530:
0: nopl 0x0(%rax,%rax,1)
5: xchg %ax,%ax
7: push %rbp
8: mov %rsp,%rbp
b: push %rbx
c: push %r13
e: push %r14
10: mov %rdi,%rbx
13: movzwq 0xb0(%rbx),%r13
1b: xor %r14d,%r14d
1e: or $0x2,%r14d
22: mov $0x1,%eax
27: cmp $0x2,%r14
2b: jne 0x000000000000002f
2d: xor %eax,%eax
2f: pop %r14
31: pop %r13
33: pop %rbx
34: leave
35: ret
36: int3
LLVM supports several variants that we could set when initialising the
disassembler, for example with:
LLVMSetDisasmOptions(*ctx,
LLVMDisassembler_Option_AsmPrinterVariant);
but the default printer is kept for now. Here is the output with LLVM:
# bpftool prog dump jited id 56
bpf_prog_6deef7357e7b4530:
0: nopl (%rax,%rax)
5: nop
7: pushq %rbp
8: movq %rsp, %rbp
b: pushq %rbx
c: pushq %r13
e: pushq %r14
10: movq %rdi, %rbx
13: movzwq 176(%rbx), %r13
1b: xorl %r14d, %r14d
1e: orl $2, %r14d
22: movl $1, %eax
27: cmpq $2, %r14
2b: jne 2
2d: xorl %eax, %eax
2f: popq %r14
If I'm reading the asm correctly the difference is significant.
jne 0x2f was an absolute address and jmps were easy
to follow.
While in llvm disasm it's 'jne 2' ?! What is 2 ?
2 bytes from the next insn of 0x2d ?
Yes, that's it. Apparently, this is how the operand is encoded, and
libbfd does the translation to the absolute address:
# bpftool prog dump jited id 7868 opcodes
[...]
2b: jne 0x000000000000002f
75 02
[...]
The same difference is observable between objdump and llvm-objdump on an
x86-64 binary for example, although they usually have labels to refer to
("jne -22 <_obstack_memory_used+0x7d0>"), making the navigation
easier. The only mention I could find of that difference is a report
from 2013 [0].
[0] https://discourse.llvm.org/t/llvm-objdump-disassembling-jmp/29584/2
That is super hard to read.
Is there a way to tune/configure llvm disasm?
There's a function and some options to tune it, but I tried them and
none applies to converting the jump operands.
int LLVMSetDisasmOptions(LLVMDisasmContextRef DC, uint64_t Options);
/* The option to produce marked up assembly. */
#define LLVMDisassembler_Option_UseMarkup 1
/* The option to print immediates as hex. */
#define LLVMDisassembler_Option_PrintImmHex 2
/* The option use the other assembler printer variant */
#define LLVMDisassembler_Option_AsmPrinterVariant 4
/* The option to set comment on instructions */
#define LLVMDisassembler_Option_SetInstrComments 8
/* The option to print latency information alongside instructions */
#define LLVMDisassembler_Option_PrintLatency 16
I found that LLVMDisassembler_Option_AsmPrinterVariant read better,
although in my patch I kept the default output which looked closer to
the existing from libbfd. Here's what the option produces:
bpf_prog_6deef7357e7b4530:
0: nop dword ptr [rax + rax]
5: nop
7: push rbp
8: mov rbp, rsp
b: push rbx
c: push r13
e: push r14
10: mov rbx, rdi
13: movzx r13, word ptr [rbx + 180]
1b: xor r14d, r14d
1e: or r14d, 2
22: mov eax, 1
27: cmp r14, 2
2b: jne 2
2d: xor eax, eax
2f: pop r14
31: pop r13
33: pop rbx
34: leave
35: re
But the jne operand remains a '2'. I'm not aware of any option to change
it in LLVM's disassembler :(.