On Thu, Nov 3, 2022 at 2:07 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > On 7/15/22 4:55 AM, Zhengchao Shao wrote: > > Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any > > skbs, that is, the flow->head is null. > > The root cause, as the [2] says, is because that bpf_prog_test_run_skb() > > run a bpf prog which redirects empty skbs. > > So we should determine whether the length of the packet modified by bpf > > prog or others like bpf_prog_test is valid before forwarding it directly. > > > > LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5 > > LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html > > > > Reported-by: syzbot+7a12909485b94426aceb@xxxxxxxxxxxxxxxxxxxxxxxxx > > Signed-off-by: Zhengchao Shao <shaozhengchao@xxxxxxxxxx> > > --- > > v3: modify debug print > > v2: need move checking to convert___skb_to_skb and add debug info > > v1: should not check len in fast path > > > > include/linux/skbuff.h | 8 ++++++++ > > net/bpf/test_run.c | 3 +++ > > net/core/dev.c | 1 + > > 3 files changed, 12 insertions(+) > > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > > index f6a27ab19202..82e8368ba6e6 100644 > > --- a/include/linux/skbuff.h > > +++ b/include/linux/skbuff.h > > @@ -2459,6 +2459,14 @@ static inline void skb_set_tail_pointer(struct sk_buff *skb, const int offset) > > > > #endif /* NET_SKBUFF_DATA_USES_OFFSET */ > > > > +static inline void skb_assert_len(struct sk_buff *skb) > > +{ > > +#ifdef CONFIG_DEBUG_NET > > + if (WARN_ONCE(!skb->len, "%s\n", __func__)) > > + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); > > +#endif /* CONFIG_DEBUG_NET */ > > +} > > + > > /* > > * Add data to an sk_buff > > */ > > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c > > index 2ca96acbc50a..dc9dc0bedca0 100644 > > --- a/net/bpf/test_run.c > > +++ b/net/bpf/test_run.c > > @@ -955,6 +955,9 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb) > > { > > struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb; > > > > + if (!skb->len) > > + return -EINVAL; > > From another recent report [0], I don't think this change is fixing the report > from syzbot. It probably makes sense to revert this patch. > > afaict, This '!skb->len' test is done after > if (is_l2) > __skb_push(skb, hh_len); > > Hence, skb->len is not zero in convert___skb_to_skb(). The proper place to test > skb->len is before __skb_push() to ensure there is some network header after the > mac or may as well ensure "data_size_in > ETH_HLEN" at the beginning. When is_l2==true, __skb_push will result in non-zero skb->len, so we should be good, right? The only issue is when we do bpf_redirect into a tunneling device and do __skb_pull, but that's now fixed by [0]. When is_l2==false, the existing check in convert___skb_to_skb will make sure there is something in the l3 headers. So it seems like this patch is still needed. Or am I missing something? > The fix in [0] is applied. If it turns out there are other cases caused by the > skb generated by test_run that needs extra fixes in bpf_redirect_*, it needs to > revisit an earlier !skb->len check mentioned above and the existing test cases > outside of test_progs would have to adjust accordingly. > > [0]: https://lore.kernel.org/bpf/20221027225537.353077-1-sdf@xxxxxxxxxx/ > > > + > > if (!__skb) > > return 0; > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index d588fd0a54ce..716df64fcfa5 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -4168,6 +4168,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > > bool again = false; > > > > skb_reset_mac_header(skb); > > + skb_assert_len(skb); > > > > if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) > > __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED); >