Re: [PATCH bpf-next] bpf: don't check against device MTU in __bpf_skb_max_len

Maciej Żenczykowski <maze@xxxxxxxxxx> · Thu, 10 Sep 2020 13:00:12 -0700

All recent Android R common kernels are currently carrying the
following divergence from upstream:

https://android.googlesource.com/kernel/common/+/194a1bf09a7958551a9e2dc947bdfe3f8be8eca8%5E%21/

static u32 __bpf_skb_max_len(const struct sk_buff *skb)
 {
- return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
-  SKB_MAX_ALLOC;
+ if (skb_at_tc_ingress(skb) || !skb->dev)
+ return SKB_MAX_ALLOC;
+ return skb->dev->mtu + skb->dev->hard_header_len;
 }

There wasn't agreement on how to handle this upstream because some
folks thought this check was useful...
Myself - I'm not entirely certain...
I'd like to be able to test for (something like) this, yes, but the
way it's done now is kind of pointless...
It breaks for gso packets anyway - it's not true that a gso packet can
just ignore the mtu check, you do actually need to check individual
gso segments are sufficiently small...
You need to check against the right interface, which again in the
presence of bpf redirect it currently utterly fails.
Checking on receive just doesn't seem useful, so what if I want to
increase packet size that arrives at the stack?
I also don't understand where SKB_MAX_ALLOC even comes from... skb's
on lo/veth can be 64KB not SKB_MAX_ALLOC (which ifirc is 16KB).

I think maybe there's now sufficient access to skb->len &
gso_segs/size to implement this in bpf instead of relying on the
kernel checking it???
But that might be slow...

It sounded like it was trending towards some sort of larger scale refactoring.

I haven't had the opportunity to take another look at this since then.
I'm not at all sure what would break if we just utterly deleted these
pkt too big > mtu checks.

In general in my experience bpf poorly handles gso and mtu and this is
an area in need of improvement.
I've been planning to get around to this, but am currently busy with a
bazillion other higher priority things :-(
Like trying to figure out whether XDP is even usable with real world
hardware limitations (currently the answer is still leaning towards
no, though there was some slightly positive news in the past few
days).  And whether we can even reach our performance goals with
jit'ed bpf... or do we need to just write it in kernel C... :-(

On Mon, Sep 7, 2020 at 7:08 AM Jesper Dangaard Brouer <brouer@xxxxxxxxxx> wrote:
>
> On Fri, 4 Sep 2020 16:39:47 -0700
> Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
>
> > On Fri, 04 Sep 2020 11:30:28 +0200 Jesper Dangaard Brouer wrote:
> > > @@ -3211,8 +3211,7 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
> > >
> > >  static u32 __bpf_skb_max_len(const struct sk_buff *skb)
> > >  {
> > > -   return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
> > > -                     SKB_MAX_ALLOC;
> > > +   return SKB_MAX_ALLOC;
> > >  }
> > >
> > >  BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
> > >
> >
> > Looks familiar:
> > https://lore.kernel.org/netdev/20200420231427.63894-1-zenczykowski@xxxxxxxxx/
> >
>
> Great to see that others have proposed same fix before.  Unfortunately
> it seems that the thread have died, and no patch got applied to
> address this.  (Cc. Maze since he was "mull this over a bit more"...)
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>