On Tue, Mar 26, 2024 at 2:37 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > On Tue, Mar 26, 2024 at 1:46 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > > > > On 3/25/24 5:28 PM, Stanislav Fomichev wrote: > > > On 03/25, Alexei Starovoitov wrote: > > >> On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > >>> > > >>> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov > > >>> <alexei.starovoitov@xxxxxxxxx> wrote: > > >>>> > > >>>> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@xxxxxxxxxx> wrote: > > >>>>> > > >>>>> Hello: > > >>>>> > > >>>>> This patch was applied to bpf/bpf.git (master) > > >>>>> by Daniel Borkmann <daniel@xxxxxxxxxxxxx>: > > >>>>> > > >>>>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote: > > >>>>>> Some drivers ndo_start_xmit() expect a minimal size, as shown > > >>>>>> by various syzbot reports [1]. > > >>>>>> > > >>>>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len") > > >>>>>> the missing attribute that can be used by upper layers. > > >>>>>> > > >>>>>> We need to use it in __bpf_redirect_common(). > > >>>> > > >>>> This patch broke empty_skb test: > > >>>> $ test_progs -t empty_skb > > >>>> > > >>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > > >>>> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress > > >>>> [redirect_ingress]: actual -34 != expected 0 > > >>>> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec > > >>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > > >>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > > >>>> [redirect_egress]: actual -34 != expected 1 > > >>>> > > >>>> And looking at the test I think it's not a test issue. > > >>>> This check > > >>>> if (unlikely(skb->len < dev->min_header_len)) > > >>>> is rejecting more than it should. > > >>>> > > >>>> So I reverted this patch for now. > > >>> > > >>> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I > > >>> move my sanity test in __bpf_tx_skb(), > > >>> the bpf test program still fails, I am suspecting the test needs to be adjusted. > > >>> > > >>> > > >>> > > >>> diff --git a/net/core/filter.c b/net/core/filter.c > > >>> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d > > >>> 100644 > > >>> --- a/net/core/filter.c > > >>> +++ b/net/core/filter.c > > >>> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct > > >>> net_device *dev, struct sk_buff *skb) > > >>> return -ENETDOWN; > > >>> } > > >>> > > >>> + if (unlikely(skb->len < dev->min_header_len)) { > > >>> + pr_err_once("__bpf_tx_skb skb->len=%u < > > >>> dev(%s)->min_header_len(%u)\n", skb->len, dev->name, > > >>> dev->min_header_len); > > >>> + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); > > >>> + kfree_skb(skb); > > >>> + return -ERANGE; > > >>> + } // Note: this is before we change skb->dev > > >>> skb->dev = dev; > > >>> skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb)); > > >>> skb_clear_tstamp(skb); > > >>> > > >>> > > >>> --> > > >>> > > >>> > > >>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > > >>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > > >>> [redirect_egress]: actual -34 != expected 1 > > >>> > > >>> [ 58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14) > > >>> [ 58.382778] skb len=1 headroom=78 headlen=1 tailroom=113 > > >>> mac=(64,14) net=(78,-1) trans=-1 > > >>> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > > >>> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) > > >>> hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0 > > >> > > >> Hmm. Something is off. > > >> That test creates 15 byte skb. > > >> It's not obvious to me how it got reduced to 1. > > >> Something stripped L2 header and the prog is trying to redirect > > >> such skb into veth that expects skb with L2 ? > > >> > > >> Stan, > > >> please take a look. > > >> Since you wrote that test. > > > > > > Sure. Daniel wants to take a look on a separate thread, so we can sync > > > up. Tentatively, seems like the failure is in the lwt path that does > > > indeed drop the l2. > > > > If we'd change the test into the below, the tc and empty_skb tests pass. > > run_lwt_bpf() calls into skb_do_redirect() which has L2 stripped, and thus > > skb->len is 1 in this test. We do use skb_mac_header_len() also in other > > tc BPF helpers, so perhaps s/skb->len/skb_mac_header_len(skb)/ is the best > > way forward.. > > > > static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev, > > u32 flags) > > { > > /* Verify that a link layer header is carried */ > > if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) { > > kfree_skb(skb); > > return -ERANGE; > > } > > > > if (unlikely(skb_mac_header_len(skb) < dev->min_header_len)) { > > Unfortunately this will not prevent frames with skb->len == 1 to reach > an Ethernet driver ndo_start_xmit() > > At ndo_start_xmit(), we do not look where the MAC header supposedly > starts in the skb, we only use skb->data > > I have a syzbot repro using team driver, so I added the following part in team : > > diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c > index 0a44bbdcfb7b9f30a0c27b700246501c5eba322f..75e5ef585a8f05b35cfddbae0bfc377864e6e38c > 100644 > --- a/drivers/net/team/team.c > +++ b/drivers/net/team/team.c > @@ -1714,6 +1714,11 @@ static netdev_tx_t team_xmit(struct sk_buff > *skb, struct net_device *dev) > bool tx_success; > unsigned int len = skb->len; > > + if (len < 14) { > + pr_err_once("team_xmit(len=%u)\n", len); > + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); > + WARN_ON_ONCE(1); > + } > tx_success = team_queue_override_transmit(team, skb); > if (!tx_success) > tx_success = team->ops.transmit(team, skb); > > > And I get (with your suggestion instead of skb->len) > Missing part in my copy/paste : [ 41.123829] team_xmit(len=1) [ 41.124335] skb len=1 headroom=78 headlen=1 tailroom=113 > mac=(78,0) net=(78,-1) trans=-1 > shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) > hash(0x0 sw=0 l4=0) proto=0x88a8 pkttype=3 iif=0 > [ 41.126553] dev name=team0 feat=0x0000e0064fddfbe9 > [ 41.127132] skb linear: 00000000: 55 > [ 41.128487] ------------[ cut here ]------------ > [ 41.128551] WARNING: CPU: 2 PID: 1880 at > drivers/net/team/team.c:1720 team_xmit (drivers/net/team/team.c:1720 > (discriminator 1)) > [ 41.129072] Modules linked in: macsec macvtap macvlan hsr wireguard > curve25519_x86_64 libcurve25519_generic libchacha20poly1305 > chacha_x86_64 libchacha poly1305_x86_64 batman_adv dummy bridge sr_mod > cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p netfs 9pnet > [ 41.129613] CPU: 2 PID: 1880 Comm: b330650301 Not tainted 6.8.0-virtme #238 > [ 41.129664] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.16.3-debian-1.16.3-2 04/01/2014 > [ 41.129780] RIP: 0010:team_xmit (drivers/net/team/team.c:1720 > (discriminator 1)) > [ 41.129847] Code: 41 54 53 44 8b 7f 70 48 89 fb 41 83 ff 0d 77 1c 80 > 3d a0 24 8d 01 00 0f 84 0d 01 00 00 80 3d 92 24 8d 01 00 0f 84 e3 00 > 00 00 <0f> 0b 41 80 be 21 0b 00 00 00 0f 84 9d 00 00 00 0f b7 43 7c 66 > 85 > All code > ======== > 0: 41 54 push %r12 > 2: 53 push %rbx > 3: 44 8b 7f 70 mov 0x70(%rdi),%r15d > 7: 48 89 fb mov %rdi,%rbx > a: 41 83 ff 0d cmp $0xd,%r15d > e: 77 1c ja 0x2c > 10: 80 3d a0 24 8d 01 00 cmpb $0x0,0x18d24a0(%rip) # 0x18d24b7 > 17: 0f 84 0d 01 00 00 je 0x12a > 1d: 80 3d 92 24 8d 01 00 cmpb $0x0,0x18d2492(%rip) # 0x18d24b6 > 24: 0f 84 e3 00 00 00 je 0x10d > 2a:* 0f 0b ud2 <-- trapping instruction > 2c: 41 80 be 21 0b 00 00 cmpb $0x0,0xb21(%r14) > 33: 00 > 34: 0f 84 9d 00 00 00 je 0xd7 > 3a: 0f b7 43 7c movzwl 0x7c(%rbx),%eax > 3e: 66 data16 > 3f: 85 .byte 0x85 > > Code starting with the faulting instruction > =========================================== > 0: 0f 0b ud2 > 2: 41 80 be 21 0b 00 00 cmpb $0x0,0xb21(%r14) > 9: 00 > a: 0f 84 9d 00 00 00 je 0xad > 10: 0f b7 43 7c movzwl 0x7c(%rbx),%eax > 14: 66 data16 > 15: 85 .byte 0x85 > [ 41.129902] RSP: 0018:ffffa4210433b938 EFLAGS: 00000246 > [ 41.129945] RAX: 0000000000000000 RBX: ffffa4210858a300 RCX: 0000000000000000 > [ 41.129961] RDX: 0000000000000000 RSI: 00000000ffff7fff RDI: 0000000000000001 > [ 41.129975] RBP: ffffa4210433b960 R08: 0000000000000000 R09: ffffa4210433b630 > [ 41.129989] R10: 0000000000000001 R11: ffffffff8407d340 R12: 0000000000000000 > [ 41.130004] R13: ffffa4210ecee000 R14: ffffa4210ece4000 R15: 0000000000000001 > [ 41.130074] FS: 00007f91d9549740(0000) GS:ffffa42fffa80000(0000) > knlGS:0000000000000000 > [ 41.130095] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 41.130140] CR2: 00007f8953077fb0 CR3: 0000000104f42000 CR4: 00000000000006f0 > [ 41.130229] Call Trace: > [ 41.130331] <TASK> > [ 41.130530] ? show_regs (arch/x86/kernel/dumpstack.c:479) > [ 41.130598] ? __warn (kernel/panic.c:694) > [ 41.130611] ? team_xmit (drivers/net/team/team.c:1720 (discriminator 1)) > [ 41.130625] ? report_bug (lib/bug.c:180 lib/bug.c:219) > [ 41.130640] ? handle_bug (arch/x86/kernel/traps.c:239) > [ 41.130653] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) > [ 41.130665] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) > [ 41.130700] ? team_xmit (drivers/net/team/team.c:1720 (discriminator 1)) > [ 41.130714] ? team_xmit (drivers/net/team/team.c:1719 (discriminator 6)) > [ 41.130734] dev_hard_start_xmit (./include/linux/netdevice.h:4903 > ./include/linux/netdevice.h:4917 net/core/dev.c:3531 > net/core/dev.c:3547) > [ 41.130768] __dev_queue_xmit (./include/linux/netdevice.h:3287 > (discriminator 25) net/core/dev.c:4336 (discriminator 25)) > [ 41.130780] ? kmalloc_reserve (net/core/skbuff.c:580 (discriminator 4)) > [ 41.130796] ? pskb_expand_head (net/core/skbuff.c:2292) > [ 41.130815] __bpf_redirect (./include/linux/netdevice.h:3287 > (discriminator 25) net/core/filter.c:2143 (discriminator 25) > net/core/filter.c:2172 (discriminator 25) net/core/filter.c:2196 > (discriminator 25)) > [ 41.130825] bpf_clone_redirect (net/core/filter.c:2467 > (discriminator 1) net/core/filter.c:2439 (discriminator 1)) > [ 41.130841] bpf_prog_9845f5eee09e82c6+0x61/0x66 > [ 41.130948] ? bpf_ksym_find (./include/linux/rbtree_latch.h:113 > ./include/linux/rbtree_latch.h:208 kernel/bpf/core.c:734) > [ 41.130963] ? is_bpf_text_address > (./arch/x86/include/asm/preempt.h:84 (discriminator 13) > ./include/linux/rcupdate.h:97 (discriminator 13) > ./include/linux/rcupdate.h:813 (discriminator 13) > kernel/bpf/core.c:769 (discriminator 13)) > [ 41.130976] ? kernel_text_address (kernel/extable.c:125 > (discriminator 1) kernel/extable.c:94 (discriminator 1)) > [ 41.130989] ? __kernel_text_address (kernel/extable.c:79 (discriminator 1)) > [ 41.131002] ? unwind_get_return_address > (arch/x86/kernel/unwind_frame.c:19 (discriminator 1)) > [ 41.131014] ? __pfx_stack_trace_consume_entry (kernel/stacktrace.c:83) > [ 41.131028] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:26) > [ 41.131044] ? stack_depot_save_flags (lib/stackdepot.c:675) > [ 41.131062] ? ktime_get (kernel/time/timekeeping.c:292 > kernel/time/timekeeping.c:388 kernel/time/timekeeping.c:848) > [ 41.131076] bpf_test_run (./include/linux/bpf.h:1234 > ./include/linux/filter.h:657 ./include/linux/filter.h:664 > net/bpf/test_run.c:425) > [ 41.131087] ? security_sk_alloc (security/security.c:4662 (discriminator 13))