On Mon, Sep 14, 2020 at 12:55 AM Luka Oreskovic <luka.oreskovic@xxxxxxxxxx> wrote: > > On Fri, Sep 11, 2020 at 8:14 PM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Fri, Sep 11, 2020 at 9:45 AM Luka Oreskovic > > <luka.oreskovic@xxxxxxxxxx> wrote: > > > > > > Greetings everyone, > > > > > > I have been testing various BPF programs on the ARM32 architecture and > > > have encountered a strange error. > > > > > > When trying to run a simple program that prints out the arguments of > > > the open syscall, > > > I found some strange behaviour with the pointer offsets when accessing > > > the arguments: > > > The output of llvm-objdump differed from the verifier error dump log. > > > Notice the differences in lines 0 and 1. Why is the bytecode being > > > altered at runtime? > > > > > > I attached the program, the llvm-objdump result and the verifier dump below. > > > > > > Best wishes, > > > Luka Oreskovic > > > > > > BPF program > > > -------------------------------------------- > > > #include "vmlinux.h" > > > #include <bpf/bpf_helpers.h> > > > > > > SEC("tracepoint/syscalls/sys_enter_open") > > > int tracepoint__syscalls__sys_enter_open(struct trace_event_raw_sys_enter* ctx) > > > { > > > const char *arg1 = (const char *)ctx->args[0]; > > > int arg2 = (int)ctx->args[1]; > > > > > > bpf_printk("Open arg 1: %s\n", arg1); > > > bpf_printk("Open arg 2: %d\n", arg2); > > > > > > return 0; > > > } > > > > > > char LICENSE[] SEC("license") = "GPL"; > > > > > > > > > llvm-objdump of program > > > -------------------------------------------- > > > Disassembly of section tracepoint/syscalls/sys_enter_open: > > > > > > 0000000000000000 tracepoint__syscalls__sys_enter_open: > > > ; int arg2 = (int)ctx->args[1]; > > > 0: 79 16 18 00 00 00 00 00 r6 = *(u64 *)(r1 + 24) > > > ; const char *arg1 = (const char *)ctx->args[0]; > > > 1: 79 13 10 00 00 00 00 00 r3 = *(u64 *)(r1 + 16) > > > 2: 18 01 00 00 20 31 3a 20 00 00 00 00 25 73 0a 00 r1 = > > > 2941353058775328 ll > > > ; bpf_printk("Open arg 1: %s\n", arg1); > > > 4: 7b 1a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r1 > > > 5: 18 07 00 00 4f 70 65 6e 00 00 00 00 20 61 72 67 r7 = > > > 7454127125170581583 ll > > > 7: 7b 7a f0 ff 00 00 00 00 *(u64 *)(r10 - 16) = r7 > > > 8: bf a1 00 00 00 00 00 00 r1 = r10 > > > 9: 07 01 00 00 f0 ff ff ff r1 += -16 > > > 10: b7 02 00 00 10 00 00 00 r2 = 16 > > > 11: 85 00 00 00 06 00 00 00 call 6 > > > 12: 18 01 00 00 20 32 3a 20 00 00 00 00 25 64 0a 00 r1 = > > > 2924860384358944 ll > > > ; bpf_printk("Open arg 2: %d\n", arg2); > > > 14: 7b 1a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r1 > > > 15: 7b 7a f0 ff 00 00 00 00 *(u64 *)(r10 - 16) = r7 > > > 16: bf a1 00 00 00 00 00 00 r1 = r10 > > > 17: 07 01 00 00 f0 ff ff ff r1 += -16 > > > 18: b7 02 00 00 10 00 00 00 r2 = 16 > > > 19: bf 63 00 00 00 00 00 00 r3 = r6 > > > 20: 85 00 00 00 06 00 00 00 call 6 > > > ; return 0; > > > 21: b7 00 00 00 00 00 00 00 r0 = 0 > > > 22: 95 00 00 00 00 00 00 00 exit > > > > > > > > > verifier output when running program > > > -------------------------------------------- > > > libbpf: -- BEGIN DUMP LOG --- > > > libbpf: > > > Unrecognized arg#0 type PTR > > > ; int arg2 = (int)ctx->args[1]; > > > 0: (79) r6 = *(u64 *)(r1 +16) > > > ; const char *arg1 = (const char *)ctx->args[0]; > > > 1: (79) r3 = *(u64 *)(r1 +12) > > > invalid bpf_context access off=12 size=8 > > > processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 > > > peak_states 0 mark_read 0 > > > > > > libbpf: -- END LOG -- > > > > > > One suspect would be libbpf's CO-RE relocations. Can you send full > > debug libbpf logs, it will have a full log of what libbpf adjusted. > > Please also include the definition of struct trace_event_raw_sys_enter > > from your vmlinux.h, as well as commit that your kernel was built from > > (to check the original definition). > > > Here is the data you requested. I can see the reallocations done by BPF CO-RE, > but I don't understand why they would have to be done in the first place > since I am using the vmlinux.h that has been generated using the > devices vmlinux. > Even if it made sense to change the pointer offsets, they shouldn't > break the program. Ok, there are few things at work here. I'll try to explain it and let's see how we can fix this. This is all due to 32-bit architecture, of course. 1. BPF "architecture" is strictly 64-bit one. This means that from the BPF program point of view pointers are 64-bit values, long is 64-bit as well. 2. look at struct trace_event_raw_sys_enter below: id and each element of args[] array are longs. What this means is the following: - from BPF side of view those are assumed to be 64-bit integers - but in reality in ARM32 kernel and in memory they are 32-bit integers. So the memory view of struct trace_event_raw_sys_enter differs quite a lot, depending in which context that struct definition is used. You can see that from the compiled BPF assembly output above and compare to the actual memory layout, taking into account that long is 4 bytes. For BPF args[0] is at 16 bytes, args[1] is at 24 bytes. But in reality it's at 12 byte offset for args[0] and 16 byte offset for args[1]. So what libbpf/CO-RE is doing is right, it actually relocates offset 16 to 12 and 24 to 16. So without CO-RE you'd be accessing completely wrong offsets and reading garbage. 3. But read size matters as well. When compiling your BPF program clang sees 8-byte read and emits corresponding BPF assembly instruction. While in reality it has to be 4-byte read. Then later in the verifier (see tp_prog_is_valid_access() in kernel/trace/bpf_trace.c) there is a very simple check that all offsets are multiples of access size, and 12 % 8 != 0, so the verifier complains. It's a bit verbose explanation, but I hope it describes the problem. Basically, longs in a mixed BPF/32-bit land are error prone (as well as pointers in kernel data structures, btw). Now, a short-term fix for your use case would be to copy/paste struct trace_event_raw_sys_enter and replace all longs with ints. That will fix access size and offsets. Longer-term and a more proper fix is probably for libbpf to be smarter when emitting vmlinux.h and replace 4-byte longs with ints, knowing that vmlinux.h is going to be used from always-64-bit BPF architecture. I'll think a bit more about the implications of this, and if there are no problems with that approach, I'll submit a change soon enough. Pointers will still be a problem, though, and I'm not sure how we can solve it yet. Ideally there would be some solution that would cause Clang to emit 4-byte reads for pointers. I wonder if anyone else has any ideas around that. As a separate thing, it might be a good idea to allow 4-byte aligned 8-byte reads in the verifier on 32-bit architectures, given they are allowed on such architectures. Though for you that would lead to garbage in the upper 4 bytes of BPF register, which is equally bad. > > > Kernel commit > -------------------------------------------- > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/?h=v5.9-rc3 > > > Struct definition from vmlinux.h > -------------------------------------------- > struct trace_event_raw_sys_enter { > struct trace_entry ent; > long int id; > long unsigned int args[6]; > char __data[0]; > }; > > [...] > libbpf: collected 0 externs total > libbpf: loading kernel BTF '/sys/kernel/btf/vmlinux': 0 > libbpf: sec 'tracepoint/syscalls/sys_enter_open': found 2 CO-RE relocations > libbpf: prog 'tracepoint/syscalls/sys_enter_open': relo #0: kind > <byte_off> (0), spec is [2] struct trace_event_raw_sys_ent) these logs seem to be truncated, btw > libbpf: CO-RE relocating [2] struct trace_event_raw_sys_enter: found > target candidate [4639] struct trace_event_raw_sys_entr > libbpf: prog 'tracepoint/syscalls/sys_enter_open': relo #0: matching > candidate #0 [4639] struct trace_event_raw_sys_enter.a) like here it actually should be something like: struct trace_event_raw_sys_enter.args[0] > libbpf: prog 'tracepoint/syscalls/sys_enter_open': relo #0: patched > insn #0 (LDX/ST/STX) off 24 -> 16 > libbpf: prog 'tracepoint/syscalls/sys_enter_open': relo #1: kind > <byte_off> (0), spec is [2] struct trace_event_raw_sys_ent) > libbpf: prog 'tracepoint/syscalls/sys_enter_open': relo #1: matching > candidate #0 [4639] struct trace_event_raw_sys_enter.a) > libbpf: prog 'tracepoint/syscalls/sys_enter_open': relo #1: patched > insn #1 (LDX/ST/STX) off 16 -> 12 > libbpf: load bpf program failed: Permission denied > libbpf: -- BEGIN DUMP LOG --- > libbpf: > Unrecognized arg#0 type PTR > ; int arg2 = (int)ctx->args[1]; > 0: (79) r6 = *(u64 *)(r1 +16) > ; const char *arg1 = (const char *)ctx->args[0]; > 1: (79) r3 = *(u64 *)(r1 +12) > invalid bpf_context access off=12 size=8 > processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 > peak_states 0 mark_read 0 > > libbpf: -- END LOG -- > libbpf: failed to load program 'tracepoint/syscalls/sys_enter_open' > libbpf: failed to load object 'hello_bpf' > libbpf: failed to load BPF skeleton 'hello_bpf': -4007 > failed to load BPF object -4007