On Wed, 6 May 2020 16:32:59 -0700 Maciej Żenczykowski wrote: > From: Maciej Żenczykowski <maze@xxxxxxxxxx> > > __bpf_skb_max_len(skb) is used from: > bpf_skb_adjust_room > __bpf_skb_change_tail > __bpf_skb_change_head > > but in the case of forwarding we're likely calling these functions > during receive processing on ingress and bpf_redirect()'ing at > a later point in time to egress on another interface, thus these > mtu checks are for the wrong device (input instead of output). > > This is particularly problematic if we're receiving on an L3 1500 mtu > cellular interface, trying to add an L2 header and forwarding to > an L3 mtu 1500 mtu wifi/ethernet device (which is thus L2 1514). > > The mtu check prevents us from adding the 14 byte ethernet header prior > to forwarding the packet. > > After the packet has already been redirected, we'd need to add > an additional 2nd ebpf program on the target device's egress tc hook, > but then we'd also see non-redirected traffic and have no easy > way to tell apart normal egress with ethernet header packets > from forwarded ethernet headerless packets. > > Credits to Alexei Starovoitov for the suggestion on how to 'fix' this. > > This probably should be further fixed up *somehow*, just > not at all clear how. This does at least make things work. > > Cc: Alexei Starovoitov <ast@xxxxxxxxxx> > Signed-off-by: Maciej Żenczykowski <maze@xxxxxxxxxx> > --- > net/core/filter.c | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 7d6ceaa54d21..811aba77e24b 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -3159,8 +3159,20 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, > > static u32 __bpf_skb_max_len(const struct sk_buff *skb) > { > - return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len : > - SKB_MAX_ALLOC; > + if (skb->dev) { > + unsigned short header_len = skb->dev->hard_header_len; > + > + /* HACK: Treat 0 as ETH_HLEN to allow redirect while > + * adding ethernet header from an L3 (tun, rawip, cellular) > + * to an L2 device (tap, ethernet, wifi) > + */ > + if (!header_len) > + header_len = ETH_HLEN; > + > + return skb->dev->mtu + header_len; > + } else { > + return SKB_MAX_ALLOC; > + } > } > > BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, I thought we have established that checking device MTU (m*T*u) at ingress makes a very limited amount of sense, no? Shooting from the hip here, but won't something like: if (!skb->dev || skb->tc_at_ingress) return SKB_MAX_ALLOC; return skb->dev->mtu + skb->dev->hard_header_len; Solve your problem?