On Wed, 2023-03-15 at 19:12 +0100, Alexander Lobakin wrote: > From: Ilya Leoshkevich <iii@xxxxxxxxxxxxx> > Date: Wed, 15 Mar 2023 19:00:47 +0100 > > > On Wed, 2023-03-15 at 15:54 +0100, Ilya Leoshkevich wrote: > > > On Wed, 2023-03-15 at 11:54 +0100, Alexander Lobakin wrote: > > > > From: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> > > > > Date: Wed, 15 Mar 2023 10:56:25 +0100 > > > > > > > > > From: Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> > > > > > Date: Tue, 14 Mar 2023 16:54:25 -0700 > > > > > > > > > > > On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov > > > > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > > > [...] > > > > > > > > > > > test_xdp_do_redirect:PASS:prog_run 0 nsec > > > > > > test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec > > > > > > test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec > > > > > > test_xdp_do_redirect:FAIL:pkt_count_tc unexpected > > > > > > pkt_count_tc: > > > > > > actual > > > > > > 220 != expected 9998 > > > > > > test_max_pkt_size:PASS:prog_run_max_size 0 nsec > > > > > > test_max_pkt_size:PASS:prog_run_too_big 0 nsec > > > > > > close_netns:PASS:setns 0 nsec > > > > > > #289 xdp_do_redirect:FAIL > > > > > > Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED > > > > > > > > > > > > Alex, > > > > > > could you please take a look at why it's happening? > > > > > > > > > > > > I suspect it's an endianness issue in: > > > > > > if (*metadata != 0x42) > > > > > > return XDP_ABORTED; > > > > > > but your patch didn't change that, > > > > > > so I'm not sure why it worked before. > > > > > > > > > > Sure, lemme fix it real quick. > > > > > > > > Hi Ilya, > > > > > > > > Do you have s390 testing setups? Maybe you could take a look, > > > > since > > > > I > > > > don't have one and can't debug it? Doesn't seem to be > > > > Endianness > > > > issue. > > > > I mean, I have this (the below patch), but not sure it will fix > > > > anything -- IIRC eBPF arch always matches the host arch ._. > > > > I can't figure out from the code what does happen wrongly :s > > > > And it > > > > happens only on s390. > > > > > > > > Thanks, > > > > Olek > > > > --- > > > > diff --git > > > > a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > > > > b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > > > > index 662b6c6c5ed7..b21371668447 100644 > > > > --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > > > > +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c > > > > @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void) > > > > .attach_point = BPF_TC_INGRESS); > > > > > > > > memcpy(&data[sizeof(__u32)], &pkt_udp, > > > > sizeof(pkt_udp)); > > > > - *((__u32 *)data) = 0x42; /* metadata test value */ > > > > + *((__u32 *)data) = htonl(0x42); /* metadata test value > > > > */ > > > > > > > > skel = test_xdp_do_redirect__open(); > > > > if (!ASSERT_OK_PTR(skel, "skel")) > > > > diff --git > > > > a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > > > b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > > > index cd2d4e3258b8..2475bc30ced2 100644 > > > > --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > > > +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > > > @@ -1,5 +1,6 @@ > > > > // SPDX-License-Identifier: GPL-2.0 > > > > #include <vmlinux.h> > > > > +#include <bpf/bpf_endian.h> > > > > #include <bpf/bpf_helpers.h> > > > > > > > > #define ETH_ALEN 6 > > > > @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT; > > > > SEC("xdp") > > > > int xdp_redirect(struct xdp_md *xdp) > > > > { > > > > - __u32 *metadata = (void *)(long)xdp->data_meta; > > > > + __be32 *metadata = (void *)(long)xdp->data_meta; > > > > void *data_end = (void *)(long)xdp->data_end; > > > > void *data = (void *)(long)xdp->data; > > > > > > > > @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp) > > > > if (metadata + 1 > data) > > > > return XDP_ABORTED; > > > > > > > > - if (*metadata != 0x42) > > > > + if (*metadata != __bpf_htonl(0x42)) > > > > return XDP_ABORTED; > > > > > > > > if (*payload == MARK_XMIT) > > > > > > Okay, I'll take a look. Two quick observations for now: > > > > > > - Unfortunately the above patch does not help. > > > > > > - In dmesg I see: > > > > > > Driver unsupported XDP return value 0 on prog xdp_redirect > > > (id > > > 23) > > > dev N/A, expect packet loss! > > > > I haven't identified the issue yet, but I have found a couple more > > things that might be helpful: > > > > - In problematic cases metadata contains 0, so this is not an > > endianness issue. data is still reasonable though. I'm trying to > > understand what is causing this. > > > > - Applying the following diff: > > > > --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c > > @@ -52,7 +52,7 @@ int xdp_redirect(struct xdp_md *xdp) > > > > *payload = MARK_IN; > > > > - if (bpf_xdp_adjust_meta(xdp, 4)) > > + if (false && bpf_xdp_adjust_meta(xdp, 4)) > > return XDP_ABORTED; > > > > if (retcode > XDP_PASS) > > > > causes a kernel panic even on x86_64: > > > > BUG: kernel NULL pointer dereference, address: > > 0000000000000d28 > > ... > > Call Trace: > > <TASK> > > > > build_skb_around+0x22/0xb0 > > __xdp_build_skb_from_frame+0x4e/0x130 > > bpf_test_run_xdp_live+0x65f/0x7c0 > > ? __pfx_xdp_test_run_init_page+0x10/0x10 > > bpf_prog_test_run_xdp+0x2ba/0x480 > > bpf_prog_test_run+0xeb/0x110 > > __sys_bpf+0x2b9/0x570 > > __x64_sys_bpf+0x1c/0x30 > > do_syscall_64+0x48/0xa0 > > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > > > I haven't looked into this at all, but I believe this needs to be > > fixed - BPF should never cause kernel panics. > > This one is basically the same issue as syzbot mentioned today > (separate > subthread). I'm waiting for a feedback from Toke on which way of > fixing > he'd prefer (I proposed 2). If those zeroed metadata magics that you > observe have the same roots with the panic, one fix will smash 2 > issues. > > Thanks, > Olek Sounds good, I will wait for an update then. In the meantime, I found the code that overwrites the metadata: #0 0x0000000000aaeee6 in neigh_hh_output (hh=0x83258df0, skb=0x88142200) at linux/include/net/neighbour.h:503 #1 0x0000000000ab2cda in neigh_output (skip_cache=false, skb=0x88142200, n=<optimized out>) at linux/include/net/neighbour.h:544 #2 ip6_finish_output2 (net=net@entry=0x88edba00, sk=sk@entry=0x0, skb=skb@entry=0x88142200) at linux/net/ipv6/ip6_output.c:134 #3 0x0000000000ab4cbc in __ip6_finish_output (skb=0x88142200, sk=0x0, net=0x88edba00) at linux/net/ipv6/ip6_output.c:195 #4 ip6_finish_output (net=0x88edba00, sk=0x0, skb=0x88142200) at linux/net/ipv6/ip6_output.c:206 #5 0x0000000000ab5cbc in dst_input (skb=<optimized out>) at linux/include/net/dst.h:454 #6 ip6_sublist_rcv_finish (head=head@entry=0x38000dbf520) at linux/net/ipv6/ip6_input.c:88 #7 0x0000000000ab6104 in ip6_list_rcv_finish (net=<optimized out>, head=<optimized out>, sk=0x0) at linux/net/ipv6/ip6_input.c:145 #8 0x0000000000ab72bc in ipv6_list_rcv (head=0x38000dbf638, pt=<optimized out>, orig_dev=<optimized out>) at linux/net/ipv6/ip6_input.c:354 #9 0x00000000008b3710 in __netif_receive_skb_list_ptype (orig_dev=0x880b8000, pt_prev=0x176b7f8 <ipv6_packet_type>, head=0x38000dbf638) at linux/net/core/dev.c:5520 #10 __netif_receive_skb_list_core (head=head@entry=0x38000dbf7b8, pfmemalloc=pfmemalloc@entry=false) at linux/net/core/dev.c:5568 #11 0x00000000008b4390 in __netif_receive_skb_list (head=0x38000dbf7b8) at linux/net/core/dev.c:5620 #12 netif_receive_skb_list_internal (head=head@entry=0x38000dbf7b8) at linux/net/core/dev.c:5711 #13 0x00000000008b45ce in netif_receive_skb_list (head=head@entry=0x38000dbf7b8) at linux/net/core/dev.c:5763 #14 0x0000000000950782 in xdp_recv_frames (dev=<optimized out>, skbs=<optimized out>, nframes=62, frames=0x8587c600) at linux/net/bpf/test_run.c:256 #15 xdp_test_run_batch (xdp=xdp@entry=0x38000dbf900, prog=prog@entry=0x37fffe75000, repeat=<optimized out>) at linux/net/bpf/test_run.c:334 namely: static inline int neigh_hh_output(const struct hh_cache *hh, struct sk_buff *skb) ... memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD); It's hard for me to see what is going on here, since I'm not familiar with the networking code - since XDP metadata is located at the end of headroom, should not there be something that prevents the network stack from overwriting it? Or can it be that netif_receive_skb_list() is free to do whatever it wants with that memory and we cannot expect to receive it back intact?