Re: [PATCH bpf-next 00/24] Support bpf trampoline for s390x

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 27, 2023 at 8:51 AM Ilya Leoshkevich <iii@xxxxxxxxxxxxx> wrote:
>
> On Wed, 2023-01-25 at 16:45 -0800, Andrii Nakryiko wrote:
> > On Wed, Jan 25, 2023 at 1:39 PM Ilya Leoshkevich <iii@xxxxxxxxxxxxx>
> > wrote:
> > >
> > > Hi,
> > >
> > > This series implements poke, trampoline, kfunc, mixing subprogs and
> > > tailcalls, and fixes a number of tests on s390x.
> > >
> > > The following failures still remain:
> > >
> > > #52      core_read_macros:FAIL
> > > Uses BPF_PROBE_READ(), shouldn't there be BPF_PROBE_READ_KERNEL()?
> >
> > BPF_PROBE_READ(), similarly to BPF_CORE_READ() both use
> > bpf_probe_read_kernel() internally, as it's most common use case. We
> > have separate BPF_PROBE_READ_USER() and BPF_CORE_READ_USER() for when
> > bpf_probe_read_user() has to be used.
>
> At least purely from the code perspective, BPF_PROBE_READ() seems to
> delegate to bpf_probe_read(). The following therefore helps with this
> test:
>
> --- a/tools/lib/bpf/bpf_core_read.h
> +++ b/tools/lib/bpf/bpf_core_read.h
> @@ -364,7 +364,7 @@ enum bpf_enum_value_kind {
>
>  /* Non-CO-RE variant of BPF_CORE_READ_INTO() */
>  #define BPF_PROBE_READ_INTO(dst, src, a, ...) ({
> \
> -       ___core_read(bpf_probe_read, bpf_probe_read,
> \
> +       ___core_read(bpf_probe_read_kernel, bpf_probe_read_kernel,
> \
>                      dst, (src), a, ##__VA_ARGS__)
> \
>  })
>
> @@ -400,7 +400,7 @@ enum bpf_enum_value_kind {
>
>  /* Non-CO-RE variant of BPF_CORE_READ_STR_INTO() */
>  #define BPF_PROBE_READ_STR_INTO(dst, src, a, ...) ({
> \
> -       ___core_read(bpf_probe_read_str, bpf_probe_read,
> \
> +       ___core_read(bpf_probe_read_kernel_str, bpf_probe_read_kernel,
> \
>                      dst, (src), a, ##__VA_ARGS__)
> \
>  })
>
> but I'm not sure if there are backward compatibility concerns, or if
> libbpf is supposed to rewrite this when
> !ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.

Oh, this is just a bug, it was an omission when we converted
BPF_CORE_READ to bpf_probe_read_kernel. If you look at bpf_core_read
definition, it is using bpf_probe_read_kernel, which is also the
intent for BPF_PROBE_READ. Let's fix this.

And there is no backwards compat concerns because libbpf will
automatically convert calls to bpf_probe_read_{kernel,user}[_str] to
bpf_probe_read[_str] if host kernel doesn't yet support kernel or user
specific variants.

>
> > >
> > > #82      get_stack_raw_tp:FAIL
> > > get_stack_print_output:FAIL:user_stack corrupted user stack
> > > Known issue:
> > > We cannot reliably unwind userspace on s390x without DWARF.
> >
> > like in principle, or frame pointers (or some equivalent) needs to be
> > configured for this to work?
> >
> > Asking also in the context of [0], where s390x was excluded. If there
> > is actually a way to enable frame pointer-based stack unwinding on
> > s390x, would be nice to hear from an expert.
> >
> >   [0] https://pagure.io/fesco/issue/2923
>
> For DWARFless unwinding we have -mbackchain (not to be confused with
> -fno-omit-frame-pointer, which we also have, but which just hurts
> performance without providing tangible benefits).
> -mbackchain has a few problems though:
>
> - It's not atomic. Here is a typical prologue with -mbackchain:
>
>         1: stmg    %r11,%r15,88(%r15)  # save non-volatile registers
>         2: lgr     %r14,%r15           # %r14 = sp
>         3: lay     %r15,-160(%r15)     # sp -= 160
>         4: stg     %r14,0(%r15)        # *(void **)sp = %r14
>
>   The invariant here is that *(void **)%r15 is always a pointer to the
>   next frame. This means that if we unwind from (4), we are totally
>   broken. This does not happen if we unwind from any other instruction,
>   but still.
>
> - Unwinding from (1)-(3) is not particularly good either. PSW points to
>   the callee, but R15 points to the caller's frame.
>
> - Unwinding leaf functions is like the previous problem, but worse:
>   they often do not establish a stack frame at all, so PSW and R15 are
>   out of sync for the entire duration of the call.
>
> Therefore .eh_frame-based unwinding is preferred, since it covers all
> these corner cases and is therefore reliable. From what I understand,
> adding .eh_frame unwinding to the kernel is not desirable. In an
> internal discussion we had another idea though: would it be possible to
> have an eBPF program that does .eh_frame parsing and unwinding? I
> understand that it can be technically challenging at the moment, but
> the end result would not be exploitable by crafting malicious
> .eh_frame sections, won't loop endlessly, will have good performance,
> etc.

Thanks for details. This was all discussed at length in Fedora
-fno-omit-frame-pointer discussion I linked above, so no real need to
go over this again. .eh_frame-based unwinding on BPF side is possible,
but only for processes that you knew about (and preprocessed) before
you started profiling session. Pre-processing is memory and
cpu-intensive operation on busy systems, and they will miss any
processes started during profiling. So as a general approach for
system-wide profiling it leaves a lot to be desired.

Should we enable -mbackchain in our CI for s390x to be able to capture
stack traces (even if on some instructions they might be incomplete or
outright broken)?

>
> > > #101     ksyms_module:FAIL
> > > address of kernel function bpf_testmod_test_mod_kfunc is out of
> > > range
> > > Known issue:
> > > Kernel and modules are too far away from each other on s390x.
> > >
> > > #167     sk_assign:FAIL
> > > Uses legacy map definitions in 'maps' section.
> >
> > Hm.. assuming new enough iproute2, new-style .maps definition should
> > be supported, right? Let's convert map definition?
>
> Yep, that worked. Will include in v2.

Nice.

>
> > > #190     stacktrace_build_id:FAIL
> > > Known issue:
> > > We cannot reliably unwind userspace on s390x without DWARF.
> > >
> > > #211     test_bpffs:FAIL
> > > iterators.bpf.c is broken on s390x, it uses BPF_CORE_READ(),
> > > shouldn't
> > > there be BPF_CORE_READ_KERNEL()?
> >
> > BPF_CORE_READ() is that, so must be something else
> >
> > >
> > > #218     test_profiler:FAIL
> > > A lot of BPF_PROBE_READ() usages.
> >
> > ditto, something else
> >
> > >
> > > #281     xdp_metadata:FAIL
> > > See patch 24.
> > >
> > > #284     xdp_synproxy:FAIL
> > > Verifier error:
> > > ; value = bpf_tcp_raw_gen_syncookie_ipv4(hdr->ipv4, hdr->tcp,
> > > 281: (79) r1 = *(u64 *)(r10 -80)      ; R1_w=pkt(off=14,r=74,imm=0)
> > > R10=fp0
> > > 282: (bf) r2 = r8                     ;
> > > R2_w=pkt(id=5,off=14,r=74,umax=60,var_off=(0x0; 0x3c))
> > > R8=pkt(id=5,off=14,r=74,umax=60,var_off=(0x0; 0x3c))
> > > 283: (79) r3 = *(u64 *)(r10 -104)     ;
> > > R3_w=scalar(umax=60,var_off=(0x0; 0x3c)) R10=fp0
> > > 284: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204
> > > invalid access to packet, off=14 size=0, R2(id=5,off=14,r=74)
> > > R2 offset is outside of the packet
> >
> > third arg to bpf_tcp_raw_gen_syncookie_ipv4() is defined as
> > ARG_CONST_SIZE, so is required to be strictly positive, which doesn't
> > seem to be "proven" to verifier. Please provided bigger log, it might
> > another instance of needing to sprinkle barrier_var() around.
> >
> > And maybe thinking about using ARG_CONST_SIZE_OR_ZERO instead of
> > ARG_CONST_SIZE.
>
> Here is the full log:
>
> https://gist.github.com/iii-i/8e20100c33ab6f0dffb5e6e51d1330e8
>
> Apparently we do indeed lose a constraint established by
> if (hdr->tcp_len < sizeof(*hdr->tcp)). But the naive

The test is too big and unfamiliar for me to figure this out. And the
problem is not upper bound, but lower bound. hdr->tcp_len is not
proven to be strictly greater than zero, which is what verifier
complains about. Not sure how it works on other arches right now.


But I see that bpf_csum_diff defines size arguments as
ARG_CONST_SIZE_OR_ZERO while bpf_tcp_raw_gen_syncookie_ipv4 has
ARG_CONST_SIZE. I generally found ARG_CONST_SIZE way too problematic
in practice, I'd say we should change it to ARG_CONST_SIZE_OR_ZERO.

>
> --- a/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c
> +++ b/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c
> @@ -401,6 +401,7 @@ static __always_inline int tcp_dissect(void *data,
> void *data_end,
>         hdr->tcp_len = hdr->tcp->doff * 4;
>         if (hdr->tcp_len < sizeof(*hdr->tcp))
>                 return XDP_DROP;
> +       barrier_var(hdr->tcp_len);
>
>         return XDP_TX;
>  }
> @@ -791,6 +792,7 @@ static __always_inline int syncookie_part2(void
> *ctx, void *data, void *data_end
>         hdr->tcp_len = hdr->tcp->doff * 4;
>         if (hdr->tcp_len < sizeof(*hdr->tcp))
>                 return XDP_ABORTED;
> +       barrier_var(hdr->tcp_len);
>
>         return hdr->tcp->syn ? syncookie_handle_syn(hdr, ctx, data,
> data_end, xdp) :
>                                syncookie_handle_ack(hdr);
>
> does not help.
>
> >
> > >
> > > None of these seem to be due to the new changes.
> > >
> > > Best regards,
> > > Ilya
> > >
> > > Ilya Leoshkevich (24):
> > >   selftests/bpf: Fix liburandom_read.so linker error
> > >   selftests/bpf: Fix symlink creation error
> > >   selftests/bpf: Fix fexit_stress on s390x
> > >   selftests/bpf: Fix trampoline_count on s390x
> > >   selftests/bpf: Fix kfree_skb on s390x
> > >   selftests/bpf: Set errno when urand_spawn() fails
> > >   selftests/bpf: Fix decap_sanity_ns cleanup
> > >   selftests/bpf: Fix verify_pkcs7_sig on s390x
> > >   selftests/bpf: Fix xdp_do_redirect on s390x
> > >   selftests/bpf: Fix cgrp_local_storage on s390x
> > >   selftests/bpf: Check stack_mprotect() return value
> > >   selftests/bpf: Increase SIZEOF_BPF_LOCAL_STORAGE_ELEM on s390x
> > >   selftests/bpf: Add a sign-extension test for kfuncs
> > >   selftests/bpf: Fix test_lsm on s390x
> > >   selftests/bpf: Fix test_xdp_adjust_tail_grow2 on s390x
> > >   selftests/bpf: Fix vmlinux test on s390x
> > >   libbpf: Read usdt arg spec with bpf_probe_read_kernel()
> > >   s390/bpf: Fix a typo in a comment
> > >   s390/bpf: Add expoline to tail calls
> > >   s390/bpf: Implement bpf_arch_text_poke()
> > >   bpf: btf: Add BTF_FMODEL_SIGNED_ARG flag
> > >   s390/bpf: Implement arch_prepare_bpf_trampoline()
> > >   s390/bpf: Implement bpf_jit_supports_subprog_tailcalls()
> > >   s390/bpf: Implement bpf_jit_supports_kfunc_call()
> > >
> > >  arch/s390/net/bpf_jit_comp.c                  | 708
> > > +++++++++++++++++-
> > >  include/linux/bpf.h                           |   8 +
> > >  include/linux/btf.h                           |  15 +-
> > >  kernel/bpf/btf.c                              |  16 +-
> > >  net/bpf/test_run.c                            |   9 +
> > >  tools/lib/bpf/usdt.bpf.h                      |  33 +-
> > >  tools/testing/selftests/bpf/Makefile          |   7 +-
> > >  tools/testing/selftests/bpf/netcnt_common.h   |   6 +-
> > >  .../selftests/bpf/prog_tests/bpf_cookie.c     |   6 +-
> > >  .../bpf/prog_tests/cgrp_local_storage.c       |   2 +-
> > >  .../selftests/bpf/prog_tests/decap_sanity.c   |   2 +-
> > >  .../selftests/bpf/prog_tests/fexit_stress.c   |   6 +-
> > >  .../selftests/bpf/prog_tests/kfree_skb.c      |   2 +-
> > >  .../selftests/bpf/prog_tests/kfunc_call.c     |   1 +
> > >  .../selftests/bpf/prog_tests/test_lsm.c       |   3 +-
> > >  .../bpf/prog_tests/trampoline_count.c         |   4 +
> > >  tools/testing/selftests/bpf/prog_tests/usdt.c |   1 +
> > >  .../bpf/prog_tests/verify_pkcs7_sig.c         |   9 +
> > >  .../bpf/prog_tests/xdp_adjust_tail.c          |   7 +-
> > >  .../bpf/prog_tests/xdp_do_redirect.c          |   4 +
> > >  .../selftests/bpf/progs/kfunc_call_test.c     |  18 +
> > >  tools/testing/selftests/bpf/progs/lsm.c       |   7 +-
> > >  .../bpf/progs/test_verify_pkcs7_sig.c         |  12 +-
> > >  .../selftests/bpf/progs/test_vmlinux.c        |   4 +-
> > >  .../bpf/progs/test_xdp_adjust_tail_grow.c     |   8 +-
> > >  25 files changed, 816 insertions(+), 82 deletions(-)
> > >
> > > --
> > > 2.39.1
> > >
>



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux