On Wed, 7 Oct 2020 23:37:00 +0200 Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > On 10/7/20 6:23 PM, Jesper Dangaard Brouer wrote: > [...] > > net/core/dev.c | 24 ++++++++++++++++++++++-- > > 1 file changed, 22 insertions(+), 2 deletions(-) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index b433098896b2..19406013f93e 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -3870,6 +3870,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) > > switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) { > > case TC_ACT_OK: > > case TC_ACT_RECLASSIFY: > > + *ret = NET_XMIT_SUCCESS; > > skb->tc_index = TC_H_MIN(cl_res.classid); > > break; > > case TC_ACT_SHOT: > > @@ -4064,9 +4065,12 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > > { > > struct net_device *dev = skb->dev; > > struct netdev_queue *txq; > > +#ifdef CONFIG_NET_CLS_ACT > > + bool mtu_check = false; > > +#endif > > + bool again = false; > > struct Qdisc *q; > > int rc = -ENOMEM; > > - bool again = false; > > > > skb_reset_mac_header(skb); > > > > @@ -4082,14 +4086,28 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > > > > qdisc_pkt_len_init(skb); > > #ifdef CONFIG_NET_CLS_ACT > > + mtu_check = skb_is_redirected(skb); > > skb->tc_at_ingress = 0; > > # ifdef CONFIG_NET_EGRESS > > if (static_branch_unlikely(&egress_needed_key)) { > > + unsigned int len_orig = skb->len; > > + > > skb = sch_handle_egress(skb, &rc, dev); > > if (!skb) > > goto out; > > + /* BPF-prog ran and could have changed packet size beyond MTU */ > > + if (rc == NET_XMIT_SUCCESS && skb->len > len_orig) > > + mtu_check = true; > > } > > # endif > > + /* MTU-check only happens on "last" net_device in a redirect sequence > > + * (e.g. above sch_handle_egress can steal SKB and skb_do_redirect it > > + * either ingress or egress to another device). > > + */ > > Hmm, quite some overhead in fast path. Not really, normal fast-path already call is_skb_forwardable(). And it already happens in existing code, ingress redirect code, which I remove calling in patch 6. (I have considered inlining is_skb_forwardable as a optimization for general netstack dev_forward_skb) > Also, won't this be checked multiple times on stacked devices? :( I don't think it will be checked multiple times, because we have a skb_reset_redirect() in ingress path (just after sch_handle_ingress()). > Moreover, this missed the fact that 'real' qdiscs can have > filters attached too, this would come after this check. Can't this instead be in > driver layer for those that really need it? I would probably only drop the check > as done in 1/6 and allow the BPF prog to do the validation if needed. See other reply, this is likely what we will end-up with. > > + if (mtu_check && !is_skb_forwardable(dev, skb)) { > > + rc = -EMSGSIZE; > > + goto drop; > > + } > > #endif > > /* If device/qdisc don't need skb->dst, release it right now while > > * its hot in this cpu cache. > > @@ -4157,7 +4175,9 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > > > > rc = -ENETDOWN; > > rcu_read_unlock_bh(); > > - > > +#ifdef CONFIG_NET_CLS_ACT > > +drop: > > +#endif > > atomic_long_inc(&dev->tx_dropped); > > kfree_skb_list(skb); > > return rc; > > > -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer