Maciej Żenczykowski wrote: > On Fri, Apr 19, 2024 at 7:17 AM Willem de Bruijn > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > Lena Wang (王娜) wrote: > > > On Wed, 2024-04-17 at 21:15 -0700, Maciej Żenczykowski wrote: > > > > > > > > External email : Please do not click links or open attachments until > > > > you have verified the sender or the content. > > > > On Wed, Apr 17, 2024 at 7:53 PM Lena Wang (王娜) < > > > > Lena.Wang@xxxxxxxxxxxx> wrote: > > > > > > > > > > On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote: > > > > > > > > > > > > External email : Please do not click links or open attachments > > > > until > > > > > > you have verified the sender or the content. > > > > > > Lena Wang (王娜) wrote: > > > > > > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote: > > > > > > > > > > > > > > > > External email : Please do not click links or open > > > > attachments > > > > > > until > > > > > > > > you have verified the sender or the content. > > > > > > > > > > > > Personally, I think bpf_skb_pull_data() should have > > > > > > > > automatically > > > > > > > > > > > > (ie. in kernel code) reduced how much it pulls so > > > > that it > > > > > > > > would pull > > > > > > > > > > > > headers only, > > > > > > > > > > > > > > > > > > > > > > That would be a helper that parses headers to discover > > > > > > header > > > > > > > > length. > > > > > > > > > > > > > > > > > > > > Does it actually need to? Presumably the bpf pull > > > > function > > > > > > could > > > > > > > > > > notice that it is > > > > > > > > > > a packet flagged as being of type X (UDP GSO FRAGLIST) > > > > and > > > > > > reduce > > > > > > > > the pull > > > > > > > > > > accordingly so that it doesn't pull anything from the > > > > non- > > > > > > linear > > > > > > > > > > fraglist portion??? > > > > > > > > > > > > > > > > > > > > I know only the generic overview of what udp gso is, not > > > > any > > > > > > > > details, so I am > > > > > > > > > > assuming here that there's some sort of guarantee to how > > > > > > these > > > > > > > > packets > > > > > > > > > > are structured... But I imagine there must be or we > > > > wouldn't > > > > > > be > > > > > > > > hitting these > > > > > > > > > > issues deeper in the stack? > > > > > > > > > > > > > > > > > > Perhaps for a packet of this type we're already guaranteed > > > > the > > > > > > > > headers > > > > > > > > > are in the linear portion, > > > > > > > > > and the pull should simply be ignored? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Parsing is better left to the BPF program. > > > > > > > > > > > > > > > > I do prefer adding sanity checks to the BPF helpers, over > > > > having > > > > > > to > > > > > > > > add then in the net hot path only to protect against > > > > dangerous > > > > > > BPF > > > > > > > > programs. > > > > > > > > > > > > > > > Is it OK to ignore or decrease pull length for udp gro fraglist > > > > > > packet? > > > > > > > It could save the normal packet and sent to user correctly. > > > > > > > > > > > > > > In common/net/core/filter.c > > > > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > > > > > unsigned int write_len) > > > > > > > { > > > > > > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > > > > +(SKB_GSO_UDP |SKB_GSO_UDP_L4)) { > > > > > > > > > > > > The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST. > > > > > > > > > > > Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. In > > > > > udp_offload.c udp4_gro_complete gso_type adds "SKB_GSO_FRAGLIST| > > > > > SKB_GSO_UDP_L4". Here checking these two flags is to limit the > > > > packet > > > > > as "UDP + need GSO + fraglist". > > > > > > > > > > We could remove SKB_GSO_UDP_L4 check for more packet that may > > > > addrive > > > > > skb_segment_list. > > > > > > > > > > > > +return 0; > > > > > > > > > > > > Failing for any pull is a bit excessive. And would kill a sane > > > > > > workaround of pulling only as many bytes as needed. > > > > > > > > > > > > > + or if (write_len > skb_headlen(skb)) > > > > > > > +write_len = skb_headlen(skb); > > > > > > > > > > > > Truncating requests would be a surprising change of behavior > > > > > > for this function. > > > > > > > > > > > > Failing for a pull > skb_headlen is arguably reasonable, as > > > > > > the alternative is that we let it go through but have to drop > > > > > > the now malformed packets on segmentation. > > > > > > > > > > > > > > > > > Is it OK as below? > > > > > > > > > > In common/net/core/filter.c > > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > > > unsigned int write_len) > > > > > { > > > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > > + SKB_GSO_FRAGLIST) && (write_len > > > > > skb_headlen(skb))) { > > > > > + return 0; > > > > > > > > please limit write_len to skb_headlen() instead of just returning 0 > > > > > > > > > > Hi Maze & Willem, > > > Maze's advice is: > > > In common/net/core/filter.c > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > unsigned int write_len) > > > { > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > + SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) { > > > + write_len = skb_headlen(skb); > > > + } > > > return skb_ensure_writable(skb, write_len); > > > } > > > > > > Willem's advice is to "Failing for a pull > skb_headlen is arguably > > > reasonable...". It prefers to return 0 : > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > + SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) { > > > + return 0; > > > + } > > > > > > It seems a bit conflict. However I am not sure if my understanding is > > > right and hope to get your further guide. > > > > I did not mean to return 0. But to fail a request that would pull an > > unsafe amount. The caller must get a clear error signal. > > That's hostile on userspace. > Currently the caller doesn't even check the error return... It can, and probably should. bpf_skb_pull data returns the error code from bpf_try_make_writable: return bpf_try_make_writable(skb, len ? : skb_headlen(skb)); > Why would we? We already have to reload all pointers, and have to do > and will thus redo checking on those. > > What do you expect the caller to do? Subtract -1 and try again? > That's hard to do from BPF as it involves looping... and is slow. > > We already try to not pull too much: > > void try_make_writable(struct __sk_buff* skb, int len) { > if (len > skb->len) len = skb->len; > if (skb->data_end - skb->data < len) bpf_skb_pull_data(skb, len); > } > > Is there at least something like skb->len that has the actually > pullable length in it? The above snippet shows that it passes skb_headlen if the caller passes 0. But your BPF program does not even need the data writable, so then it is of little help of course. > Or are these skb's structured in such a way that there is never a need > to pull anything, > because the headers are already always in the linear portion? That is indeed the case. So as far as I can see: A BPF program that just wants to pull the network and transport headers can diligently pull exactly what is needed. And will not even observe any data pulled into linear in practice. This is still advisable rather than trusting that the headers are linear. It may also be required by the validator? Don't know. But check the return value. > > > Back to the original report: the issue should already have been fixed > > by commit 876e8ca83667 ("net: fix NULL pointer in skb_segment_list"). > > But that commit is in the kernel for which you report the error. > > > > Turns out that the crash is not in skb_segment_list, but later in > > __udpv4_gso_segment_list_csum. Which unconditionally dereferences > > udp_hdr(seg). > > > > The above fix also mentions skb pull as the culprit, but does not > > include a BPF program. If this can be reached in other ways, then we > > do need a stronger test in skb_segment_list, as you propose. > > > > I don't want to narrowly check whether udp_hdr is safe. Essentially, > > an SKB_GSO_FRAGLIST skb layout cannot be trusted at all if even one > > byte would get pulled. > > -- > Maciej Żenczykowski, Kernel Networking Developer @ Google