On Thu, Sep 17, 2020 at 5:39 AM Jesper Dangaard Brouer <brouer@xxxxxxxxxx> wrote: > > > As you likely know[1] I'm looking into moving the MTU check (for TC-BPF) > in __bpf_skb_max_len() when e.g. called by bpf_skb_adjust_room(), > because when redirecting packets to another netdev it is not correct to > limit the MTU based on the incoming netdev. > > I was looking at doing the MTU check in bpf_redirect() helper, because > at this point we know the redirect to netdev, and returning an > indication/error that MTU was exceed, would allow the BPF-prog logic to > react, e.g. sending ICMP (instead of packet getting silently dropped). > BUT this is not possible because bpf_redirect(index, flags) helper > don't provide the packet context-object (so I cannot lookup the packet > length). > > Seeking input: > > Should/can we change the bpf_redirect API or create a new helper with > packet-context? > > Note: We have the same need for the packet context for XDP when > redirecting the new multi-buffer packets, as not all destination netdev > will support these new multi-buffer packets. > > I can of-cause do the MTU checks on kernel-side in skb_do_redirect, but > then how do people debug this? as packet will basically be silently dropped. > > > > (Looking at how does BPF-prog logic handle MTU today) > > How do bpf_skb_adjust_room() report that the MTU was exceeded? > Unfortunately it uses a common return code -ENOTSUPP which used for > multiple cases (include MTU exceeded). Thus, the BPF-prog logic cannot > use this reliably to know if this is a MTU exceeded event. (Looked > BPF-prog code and they all simply exit with TC_ACT_SHOT for all error > codes, cloudflare have the most advanced handling with > metrics->errors_total_encap_adjust_failed++). > > > [1] https://lore.kernel.org/bpf/159921182827.1260200.9699352760916903781.stgit@firesoul/ > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer > (a) the current state of the world seems very hard to use correctly, so adding new apis, or even changing existing ones seems ok to me. especially if this just means changing what error code they return (b) another complexity with bpf_redirect() is you can call it, it can succeed, but then you can not return TC_ACT_REDIRECT from the bpf program, which effectively makes the earlier *successful* bpf_redirect() call an utter no-op. (bpf_redirect() just determines what a future return TC_ACT_REDIRECT will do) so if you bpf_redirect to interface with larger mtu, then increase packet size, then return TC_ACT_OK, then you potentially end up with excessively large packet egressing through original interface (with small mtu). My vote would be to return a new distinct error from bpf_redirect() based on then current packet size and interface being redirected to, save this interface mtu somewhere, then in operations that increase packet size check against this saved mtu, for correctness you still have to check mtu after the bpf program is done, but this is then just to deal with braindead bpf code (that calls bpf_redirect and returns TC_ACT_OK, or calls bpf_redirect() multiple times, or something...). Maciej Żenczykowski, Kernel Networking Developer @ Google