On Fri, Jan 27, 2023 at 8:51 AM Ilya Leoshkevich <iii@xxxxxxxxxxxxx> wrote: > > On Wed, 2023-01-25 at 16:45 -0800, Andrii Nakryiko wrote: > > On Wed, Jan 25, 2023 at 1:39 PM Ilya Leoshkevich <iii@xxxxxxxxxxxxx> > > wrote: > > > > > > Hi, > > > > > > This series implements poke, trampoline, kfunc, mixing subprogs and > > > tailcalls, and fixes a number of tests on s390x. > > > > > > The following failures still remain: > > > > > > #52 core_read_macros:FAIL > > > Uses BPF_PROBE_READ(), shouldn't there be BPF_PROBE_READ_KERNEL()? > > > > BPF_PROBE_READ(), similarly to BPF_CORE_READ() both use > > bpf_probe_read_kernel() internally, as it's most common use case. We > > have separate BPF_PROBE_READ_USER() and BPF_CORE_READ_USER() for when > > bpf_probe_read_user() has to be used. > > At least purely from the code perspective, BPF_PROBE_READ() seems to > delegate to bpf_probe_read(). The following therefore helps with this > test: > > --- a/tools/lib/bpf/bpf_core_read.h > +++ b/tools/lib/bpf/bpf_core_read.h > @@ -364,7 +364,7 @@ enum bpf_enum_value_kind { > > /* Non-CO-RE variant of BPF_CORE_READ_INTO() */ > #define BPF_PROBE_READ_INTO(dst, src, a, ...) ({ > \ > - ___core_read(bpf_probe_read, bpf_probe_read, > \ > + ___core_read(bpf_probe_read_kernel, bpf_probe_read_kernel, > \ > dst, (src), a, ##__VA_ARGS__) > \ > }) > > @@ -400,7 +400,7 @@ enum bpf_enum_value_kind { > > /* Non-CO-RE variant of BPF_CORE_READ_STR_INTO() */ > #define BPF_PROBE_READ_STR_INTO(dst, src, a, ...) ({ > \ > - ___core_read(bpf_probe_read_str, bpf_probe_read, > \ > + ___core_read(bpf_probe_read_kernel_str, bpf_probe_read_kernel, > \ > dst, (src), a, ##__VA_ARGS__) > \ > }) > > but I'm not sure if there are backward compatibility concerns, or if > libbpf is supposed to rewrite this when > !ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE. Oh, this is just a bug, it was an omission when we converted BPF_CORE_READ to bpf_probe_read_kernel. If you look at bpf_core_read definition, it is using bpf_probe_read_kernel, which is also the intent for BPF_PROBE_READ. Let's fix this. And there is no backwards compat concerns because libbpf will automatically convert calls to bpf_probe_read_{kernel,user}[_str] to bpf_probe_read[_str] if host kernel doesn't yet support kernel or user specific variants. > > > > > > > #82 get_stack_raw_tp:FAIL > > > get_stack_print_output:FAIL:user_stack corrupted user stack > > > Known issue: > > > We cannot reliably unwind userspace on s390x without DWARF. > > > > like in principle, or frame pointers (or some equivalent) needs to be > > configured for this to work? > > > > Asking also in the context of [0], where s390x was excluded. If there > > is actually a way to enable frame pointer-based stack unwinding on > > s390x, would be nice to hear from an expert. > > > > [0] https://pagure.io/fesco/issue/2923 > > For DWARFless unwinding we have -mbackchain (not to be confused with > -fno-omit-frame-pointer, which we also have, but which just hurts > performance without providing tangible benefits). > -mbackchain has a few problems though: > > - It's not atomic. Here is a typical prologue with -mbackchain: > > 1: stmg %r11,%r15,88(%r15) # save non-volatile registers > 2: lgr %r14,%r15 # %r14 = sp > 3: lay %r15,-160(%r15) # sp -= 160 > 4: stg %r14,0(%r15) # *(void **)sp = %r14 > > The invariant here is that *(void **)%r15 is always a pointer to the > next frame. This means that if we unwind from (4), we are totally > broken. This does not happen if we unwind from any other instruction, > but still. > > - Unwinding from (1)-(3) is not particularly good either. PSW points to > the callee, but R15 points to the caller's frame. > > - Unwinding leaf functions is like the previous problem, but worse: > they often do not establish a stack frame at all, so PSW and R15 are > out of sync for the entire duration of the call. > > Therefore .eh_frame-based unwinding is preferred, since it covers all > these corner cases and is therefore reliable. From what I understand, > adding .eh_frame unwinding to the kernel is not desirable. In an > internal discussion we had another idea though: would it be possible to > have an eBPF program that does .eh_frame parsing and unwinding? I > understand that it can be technically challenging at the moment, but > the end result would not be exploitable by crafting malicious > .eh_frame sections, won't loop endlessly, will have good performance, > etc. Thanks for details. This was all discussed at length in Fedora -fno-omit-frame-pointer discussion I linked above, so no real need to go over this again. .eh_frame-based unwinding on BPF side is possible, but only for processes that you knew about (and preprocessed) before you started profiling session. Pre-processing is memory and cpu-intensive operation on busy systems, and they will miss any processes started during profiling. So as a general approach for system-wide profiling it leaves a lot to be desired. Should we enable -mbackchain in our CI for s390x to be able to capture stack traces (even if on some instructions they might be incomplete or outright broken)? > > > > #101 ksyms_module:FAIL > > > address of kernel function bpf_testmod_test_mod_kfunc is out of > > > range > > > Known issue: > > > Kernel and modules are too far away from each other on s390x. > > > > > > #167 sk_assign:FAIL > > > Uses legacy map definitions in 'maps' section. > > > > Hm.. assuming new enough iproute2, new-style .maps definition should > > be supported, right? Let's convert map definition? > > Yep, that worked. Will include in v2. Nice. > > > > #190 stacktrace_build_id:FAIL > > > Known issue: > > > We cannot reliably unwind userspace on s390x without DWARF. > > > > > > #211 test_bpffs:FAIL > > > iterators.bpf.c is broken on s390x, it uses BPF_CORE_READ(), > > > shouldn't > > > there be BPF_CORE_READ_KERNEL()? > > > > BPF_CORE_READ() is that, so must be something else > > > > > > > > #218 test_profiler:FAIL > > > A lot of BPF_PROBE_READ() usages. > > > > ditto, something else > > > > > > > > #281 xdp_metadata:FAIL > > > See patch 24. > > > > > > #284 xdp_synproxy:FAIL > > > Verifier error: > > > ; value = bpf_tcp_raw_gen_syncookie_ipv4(hdr->ipv4, hdr->tcp, > > > 281: (79) r1 = *(u64 *)(r10 -80) ; R1_w=pkt(off=14,r=74,imm=0) > > > R10=fp0 > > > 282: (bf) r2 = r8 ; > > > R2_w=pkt(id=5,off=14,r=74,umax=60,var_off=(0x0; 0x3c)) > > > R8=pkt(id=5,off=14,r=74,umax=60,var_off=(0x0; 0x3c)) > > > 283: (79) r3 = *(u64 *)(r10 -104) ; > > > R3_w=scalar(umax=60,var_off=(0x0; 0x3c)) R10=fp0 > > > 284: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204 > > > invalid access to packet, off=14 size=0, R2(id=5,off=14,r=74) > > > R2 offset is outside of the packet > > > > third arg to bpf_tcp_raw_gen_syncookie_ipv4() is defined as > > ARG_CONST_SIZE, so is required to be strictly positive, which doesn't > > seem to be "proven" to verifier. Please provided bigger log, it might > > another instance of needing to sprinkle barrier_var() around. > > > > And maybe thinking about using ARG_CONST_SIZE_OR_ZERO instead of > > ARG_CONST_SIZE. > > Here is the full log: > > https://gist.github.com/iii-i/8e20100c33ab6f0dffb5e6e51d1330e8 > > Apparently we do indeed lose a constraint established by > if (hdr->tcp_len < sizeof(*hdr->tcp)). But the naive The test is too big and unfamiliar for me to figure this out. And the problem is not upper bound, but lower bound. hdr->tcp_len is not proven to be strictly greater than zero, which is what verifier complains about. Not sure how it works on other arches right now. But I see that bpf_csum_diff defines size arguments as ARG_CONST_SIZE_OR_ZERO while bpf_tcp_raw_gen_syncookie_ipv4 has ARG_CONST_SIZE. I generally found ARG_CONST_SIZE way too problematic in practice, I'd say we should change it to ARG_CONST_SIZE_OR_ZERO. > > --- a/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c > +++ b/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c > @@ -401,6 +401,7 @@ static __always_inline int tcp_dissect(void *data, > void *data_end, > hdr->tcp_len = hdr->tcp->doff * 4; > if (hdr->tcp_len < sizeof(*hdr->tcp)) > return XDP_DROP; > + barrier_var(hdr->tcp_len); > > return XDP_TX; > } > @@ -791,6 +792,7 @@ static __always_inline int syncookie_part2(void > *ctx, void *data, void *data_end > hdr->tcp_len = hdr->tcp->doff * 4; > if (hdr->tcp_len < sizeof(*hdr->tcp)) > return XDP_ABORTED; > + barrier_var(hdr->tcp_len); > > return hdr->tcp->syn ? syncookie_handle_syn(hdr, ctx, data, > data_end, xdp) : > syncookie_handle_ack(hdr); > > does not help. > > > > > > > > > None of these seem to be due to the new changes. > > > > > > Best regards, > > > Ilya > > > > > > Ilya Leoshkevich (24): > > > selftests/bpf: Fix liburandom_read.so linker error > > > selftests/bpf: Fix symlink creation error > > > selftests/bpf: Fix fexit_stress on s390x > > > selftests/bpf: Fix trampoline_count on s390x > > > selftests/bpf: Fix kfree_skb on s390x > > > selftests/bpf: Set errno when urand_spawn() fails > > > selftests/bpf: Fix decap_sanity_ns cleanup > > > selftests/bpf: Fix verify_pkcs7_sig on s390x > > > selftests/bpf: Fix xdp_do_redirect on s390x > > > selftests/bpf: Fix cgrp_local_storage on s390x > > > selftests/bpf: Check stack_mprotect() return value > > > selftests/bpf: Increase SIZEOF_BPF_LOCAL_STORAGE_ELEM on s390x > > > selftests/bpf: Add a sign-extension test for kfuncs > > > selftests/bpf: Fix test_lsm on s390x > > > selftests/bpf: Fix test_xdp_adjust_tail_grow2 on s390x > > > selftests/bpf: Fix vmlinux test on s390x > > > libbpf: Read usdt arg spec with bpf_probe_read_kernel() > > > s390/bpf: Fix a typo in a comment > > > s390/bpf: Add expoline to tail calls > > > s390/bpf: Implement bpf_arch_text_poke() > > > bpf: btf: Add BTF_FMODEL_SIGNED_ARG flag > > > s390/bpf: Implement arch_prepare_bpf_trampoline() > > > s390/bpf: Implement bpf_jit_supports_subprog_tailcalls() > > > s390/bpf: Implement bpf_jit_supports_kfunc_call() > > > > > > arch/s390/net/bpf_jit_comp.c | 708 > > > +++++++++++++++++- > > > include/linux/bpf.h | 8 + > > > include/linux/btf.h | 15 +- > > > kernel/bpf/btf.c | 16 +- > > > net/bpf/test_run.c | 9 + > > > tools/lib/bpf/usdt.bpf.h | 33 +- > > > tools/testing/selftests/bpf/Makefile | 7 +- > > > tools/testing/selftests/bpf/netcnt_common.h | 6 +- > > > .../selftests/bpf/prog_tests/bpf_cookie.c | 6 +- > > > .../bpf/prog_tests/cgrp_local_storage.c | 2 +- > > > .../selftests/bpf/prog_tests/decap_sanity.c | 2 +- > > > .../selftests/bpf/prog_tests/fexit_stress.c | 6 +- > > > .../selftests/bpf/prog_tests/kfree_skb.c | 2 +- > > > .../selftests/bpf/prog_tests/kfunc_call.c | 1 + > > > .../selftests/bpf/prog_tests/test_lsm.c | 3 +- > > > .../bpf/prog_tests/trampoline_count.c | 4 + > > > tools/testing/selftests/bpf/prog_tests/usdt.c | 1 + > > > .../bpf/prog_tests/verify_pkcs7_sig.c | 9 + > > > .../bpf/prog_tests/xdp_adjust_tail.c | 7 +- > > > .../bpf/prog_tests/xdp_do_redirect.c | 4 + > > > .../selftests/bpf/progs/kfunc_call_test.c | 18 + > > > tools/testing/selftests/bpf/progs/lsm.c | 7 +- > > > .../bpf/progs/test_verify_pkcs7_sig.c | 12 +- > > > .../selftests/bpf/progs/test_vmlinux.c | 4 +- > > > .../bpf/progs/test_xdp_adjust_tail_grow.c | 8 +- > > > 25 files changed, 816 insertions(+), 82 deletions(-) > > > > > > -- > > > 2.39.1 > > > >