On Fri, 2024-04-19 at 13:41 -0400, Willem de Bruijn wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > Maciej Żenczykowski wrote: > > On Fri, Apr 19, 2024 at 7:17 AM Willem de Bruijn > > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > > > Lena Wang (王娜) wrote: > > > > On Wed, 2024-04-17 at 21:15 -0700, Maciej Żenczykowski wrote: > > > > > > > > > > External email : Please do not click links or open > attachments until > > > > > you have verified the sender or the content. > > > > > On Wed, Apr 17, 2024 at 7:53 PM Lena Wang (王娜) < > > > > > Lena.Wang@xxxxxxxxxxxx> wrote: > > > > > > > > > > > > On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote: > > > > > > > > > > > > > > External email : Please do not click links or open > attachments > > > > > until > > > > > > > you have verified the sender or the content. > > > > > > > Lena Wang (王娜) wrote: > > > > > > > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn > wrote: > > > > > > > > > > > > > > > > > > External email : Please do not click links or open > > > > > attachments > > > > > > > until > > > > > > > > > you have verified the sender or the content. > > > > > > > > > > > > > Personally, I think bpf_skb_pull_data() > should have > > > > > > > > > automatically > > > > > > > > > > > > > (ie. in kernel code) reduced how much it > pulls so > > > > > that it > > > > > > > > > would pull > > > > > > > > > > > > > headers only, > > > > > > > > > > > > > > > > > > > > > > > > That would be a helper that parses headers to > discover > > > > > > > header > > > > > > > > > length. > > > > > > > > > > > > > > > > > > > > > > Does it actually need to? Presumably the bpf > pull > > > > > function > > > > > > > could > > > > > > > > > > > notice that it is > > > > > > > > > > > a packet flagged as being of type X (UDP GSO > FRAGLIST) > > > > > and > > > > > > > reduce > > > > > > > > > the pull > > > > > > > > > > > accordingly so that it doesn't pull anything from > the > > > > > non- > > > > > > > linear > > > > > > > > > > > fraglist portion??? > > > > > > > > > > > > > > > > > > > > > > I know only the generic overview of what udp gso > is, not > > > > > any > > > > > > > > > details, so I am > > > > > > > > > > > assuming here that there's some sort of guarantee > to how > > > > > > > these > > > > > > > > > packets > > > > > > > > > > > are structured... But I imagine there must be or > we > > > > > wouldn't > > > > > > > be > > > > > > > > > hitting these > > > > > > > > > > > issues deeper in the stack? > > > > > > > > > > > > > > > > > > > > Perhaps for a packet of this type we're already > guaranteed > > > > > the > > > > > > > > > headers > > > > > > > > > > are in the linear portion, > > > > > > > > > > and the pull should simply be ignored? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Parsing is better left to the BPF program. > > > > > > > > > > > > > > > > > > I do prefer adding sanity checks to the BPF helpers, > over > > > > > having > > > > > > > to > > > > > > > > > add then in the net hot path only to protect against > > > > > dangerous > > > > > > > BPF > > > > > > > > > programs. > > > > > > > > > > > > > > > > > Is it OK to ignore or decrease pull length for udp gro > fraglist > > > > > > > packet? > > > > > > > > It could save the normal packet and sent to user > correctly. > > > > > > > > > > > > > > > > In common/net/core/filter.c > > > > > > > > static inline int __bpf_try_make_writable(struct > sk_buff *skb, > > > > > > > > unsigned int write_len) > > > > > > > > { > > > > > > > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > > > > > +(SKB_GSO_UDP |SKB_GSO_UDP_L4)) { > > > > > > > > > > > > > > The issue is not with SKB_GSO_UDP_L4, but with > SKB_GSO_FRAGLIST. > > > > > > > > > > > > > Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. > In > > > > > > udp_offload.c udp4_gro_complete gso_type adds > "SKB_GSO_FRAGLIST| > > > > > > SKB_GSO_UDP_L4". Here checking these two flags is to limit > the > > > > > packet > > > > > > as "UDP + need GSO + fraglist". > > > > > > > > > > > > We could remove SKB_GSO_UDP_L4 check for more packet that > may > > > > > addrive > > > > > > skb_segment_list. > > > > > > > > > > > > > > +return 0; > > > > > > > > > > > > > > Failing for any pull is a bit excessive. And would kill a > sane > > > > > > > workaround of pulling only as many bytes as needed. > > > > > > > > > > > > > > > + or if (write_len > skb_headlen(skb)) > > > > > > > > +write_len = skb_headlen(skb); > > > > > > > > > > > > > > Truncating requests would be a surprising change of > behavior > > > > > > > for this function. > > > > > > > > > > > > > > Failing for a pull > skb_headlen is arguably reasonable, > as > > > > > > > the alternative is that we let it go through but have to > drop > > > > > > > the now malformed packets on segmentation. > > > > > > > > > > > > > > > > > > > > Is it OK as below? > > > > > > > > > > > > In common/net/core/filter.c > > > > > > static inline int __bpf_try_make_writable(struct sk_buff > *skb, > > > > > > unsigned int write_len) > > > > > > { > > > > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > > > + SKB_GSO_FRAGLIST) && (write_len > > > > > > skb_headlen(skb))) { > > > > > > + return 0; > > > > > > > > > > please limit write_len to skb_headlen() instead of just > returning 0 > > > > > > > > > > > > > Hi Maze & Willem, > > > > Maze's advice is: > > > > In common/net/core/filter.c > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > > unsigned int write_len) > > > > { > > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > + SKB_GSO_FRAGLIST) && (write_len > > skb_headlen(skb))) { > > > > + write_len = skb_headlen(skb); > > > > + } > > > > return skb_ensure_writable(skb, write_len); > > > > } > > > > > > > > Willem's advice is to "Failing for a pull > skb_headlen is > arguably > > > > reasonable...". It prefers to return 0 : > > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > + SKB_GSO_FRAGLIST) && (write_len > > skb_headlen(skb))) { > > > > + return 0; > > > > + } > > > > > > > > It seems a bit conflict. However I am not sure if my > understanding is > > > > right and hope to get your further guide. > > > > > > I did not mean to return 0. But to fail a request that would pull > an > > > unsafe amount. The caller must get a clear error signal. > > > > That's hostile on userspace. > > Currently the caller doesn't even check the error return... > > It can, and probably should. > > bpf_skb_pull data returns the error code from bpf_try_make_writable: > > return bpf_try_make_writable(skb, len ? : skb_headlen(skb)); > > > Why would we? We already have to reload all pointers, and have to > do > > and will thus redo checking on those. > > > > What do you expect the caller to do? Subtract -1 and try again? > > That's hard to do from BPF as it involves looping... and is slow. > > > > We already try to not pull too much: > > > > void try_make_writable(struct __sk_buff* skb, int len) { > > if (len > skb->len) len = skb->len; > > if (skb->data_end - skb->data < len) bpf_skb_pull_data(skb, len); > > } > > > > Is there at least something like skb->len that has the actually > > pullable length in it? > > The above snippet shows that it passes skb_headlen if the caller > passes 0. > > But your BPF program does not even need the data writable, so then > it is of little help of course. > > > Or are these skb's structured in such a way that there is never a > need > > to pull anything, > > because the headers are already always in the linear portion? > > That is indeed the case. > > So as far as I can see: > > A BPF program that just wants to pull the network and transport > headers can diligently pull exactly what is needed. And will not > even observe any data pulled into linear in practice. This is still > advisable rather than trusting that the headers are linear. It may > also be required by the validator? Don't know. But check the return > value. > Hi Willem, As the discussion, is it OK for the patch below? diff --git a/net/core/filter.c b/net/core/filter.c index 3a6110ea4009..abc6029c8eef 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1655,6 +1655,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, bpf_sp); static inline int __bpf_try_make_writable(struct sk_buff *skb, unsigned int write_len) { + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & + SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) { + return -ENOMEM; + } + return skb_ensure_writable(skb, write_len); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 73b1e0e53534..2e90534c1a1e 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4036,9 +4036,11 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, unsigned int tnl_hlen = skb_tnl_header_len(skb); unsigned int delta_truesize = 0; unsigned int delta_len = 0; + unsigned int mss = skb_shinfo(skb)->gso_size; struct sk_buff *tail = NULL; struct sk_buff *nskb, *tmp; int len_diff, err; + bool err_len = false; skb_push(skb, -skb_network_offset(skb) + offset); @@ -4047,6 +4049,14 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, if (err) goto err_linearize; + if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) { + if (!list_skb) { + goto err_linearize; + } else { + err_len = true; + } + } + skb_shinfo(skb)->frag_list = NULL; while (list_skb) { @@ -4109,6 +4119,9 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, __skb_linearize(skb)) goto err_linearize; + if (err_len) + goto err_linearize; + skb_get(skb); return skb; > > > > > Back to the original report: the issue should already have been > fixed > > > by commit 876e8ca83667 ("net: fix NULL pointer in > skb_segment_list"). > > > But that commit is in the kernel for which you report the error. > > > > > > Turns out that the crash is not in skb_segment_list, but later in > > > __udpv4_gso_segment_list_csum. Which unconditionally dereferences > > > udp_hdr(seg). > > > > > > The above fix also mentions skb pull as the culprit, but does not > > > include a BPF program. If this can be reached in other ways, then > we > > > do need a stronger test in skb_segment_list, as you propose. > > > > > > I don't want to narrowly check whether udp_hdr is safe. > Essentially, > > > an SKB_GSO_FRAGLIST skb layout cannot be trusted at all if even > one > > > byte would get pulled. > > > > -- > > Maciej Żenczykowski, Kernel Networking Developer @ Google > >