On Tue, Apr 16, 2024 at 10:51 AM Maciej Żenczykowski <maze@xxxxxxxxxx> wrote: > > On Tue, Apr 16, 2024 at 10:16 AM Willem de Bruijn > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > Maciej Żenczykowski wrote: > > > On Mon, Apr 15, 2024 at 7:14 PM Lena Wang (王娜) <Lena.Wang@xxxxxxxxxxxx> wrote: > > > > > > > > On Mon, 2024-04-15 at 16:53 -0400, Willem de Bruijn wrote: > > > > > > > > > > External email : Please do not click links or open attachments until > > > > > you have verified the sender or the content. > > > > > shiming.cheng@ wrote: > > > > > > From: Shiming Cheng <shiming.cheng@xxxxxxxxxxxx> > > > > > > > > > > > > A GRO packet without fraglist is crashed and backtrace is as below: > > > > > > [ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: > > > > > > G W OE 6.6.17-android15-0-g380371ea9bf1 #1 > > > > > > [ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4 > > > > > > [ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120 > > > > > > [ 1100.812339][ C3] udp_rcv_segment+0x50/0x134 > > > > > > [ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114 > > > > > > [ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac > > > > > > [ 1100.812358][ C3] udp_rcv+0x20/0x30 > > > > > > > > > > > > The reason that the packet loses its fraglist is that in ingress > > > > > bpf > > > > > > it makes a test pull with to make sure it can read packet headers > > > > > > via direct packet access: In bpf_progs/offload.c > > > > > > try_make_writable -> bpf_skb_pull_data -> pskb_may_pull -> > > > > > > __pskb_pull_tail This operation pull the data in fraglist into > > > > > linear > > > > > > and set the fraglist to null. > > > > > > > > > > What is the right behavior from BPF with regard to SKB_GSO_FRAGLIST > > > > > skbs? > > > > > > > > > > Some, like SCTP, cannot be linearized ever, as the do not have a > > > > > single gso_size. > > > > > > > > > > Should this BPF operation just fail? > > > > > > > > > In most situation for big gso size packet, it indeed fails but BPF > > > > doesn't check the result. It seems the udp GRO packet can't be pulled/ > > > > trimed/condensed or else it can't be segmented correctly. > > > > > > > > As the BPF function comments it doesn't matter if the data pull failed > > > > or pull less. It just does a blind best effort pull. > > > > > > > > A patch to modify bpf pull length is upstreamed to Google before and > > > > below are part of Google BPF expert maze's reply: > > > > maze@xxxxxxxxxx<maze@xxxxxxxxxx> #5Apr 13, 2024 02:30AM > > > > I *think* if that patch fixes anything, then it's really proving that > > > > there's a bug in the kernel that needs to be fixed instead. > > > > It should be legal to call try_make_writable(skb, X) with *any* value > > > > of X. > > > > > > > > I add maze in loop and we could start more discussion here. > > > > > > Personally, I think bpf_skb_pull_data() should have automatically > > > (ie. in kernel code) reduced how much it pulls so that it would pull > > > headers only, > > > > That would be a helper that parses headers to discover header length. > > Does it actually need to? Presumably the bpf pull function could > notice that it is > a packet flagged as being of type X (UDP GSO FRAGLIST) and reduce the pull > accordingly so that it doesn't pull anything from the non-linear > fraglist portion??? > > I know only the generic overview of what udp gso is, not any details, so I am > assuming here that there's some sort of guarantee to how these packets > are structured... But I imagine there must be or we wouldn't be hitting these > issues deeper in the stack? Perhaps for a packet of this type we're already guaranteed the headers are in the linear portion, and the pull should simply be ignored? > > > Parsing is better left to the BPF program. > > > > > and not packet content. > > > (This is assuming the rest of the code isn't ready to deal with a longer pull, > > > which I think is the case atm. Pulling too much, and then crashing or forcing > > > the stack to drop packets because of them being malformed seems wrong...) > > > > > > In general it would be nice if there was a way to just say pull all headers... > > > (or possibly all L2/L3/L4 headers) > > > You in general need to pull stuff *before* you've even looked at the packet, > > > so that you can look at the packet, > > > so it's relatively hard/annoying to pull the correct length from bpf > > > code itself. > > > > > > > > > BPF needs to modify a proper length to do pull data. However kernel > > > > > > should also improve the flow to avoid crash from a bpf function > > > > > call. > > > > > > As there is no split flow and app may not decode the merged UDP > > > > > packet, > > > > > > we should drop the packet without fraglist in skb_segment_list > > > > > here. > > > > > > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > > > > > Signed-off-by: Shiming Cheng <shiming.cheng@xxxxxxxxxxxx> > > > > > > Signed-off-by: Lena Wang <lena.wang@xxxxxxxxxxxx> > > > > > > --- > > > > > > net/core/skbuff.c | 3 +++ > > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > > > > index b99127712e67..f68f2679b086 100644 > > > > > > --- a/net/core/skbuff.c > > > > > > +++ b/net/core/skbuff.c > > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct > > > > > sk_buff *skb, > > > > > > if (err) > > > > > > goto err_linearize; > > > > > > > > > > > > +if (!list_skb) > > > > > > +goto err_linearize; > > > > > > + > > > > This would catch the case where the entire data frag_list is > > linearized, but not a pskb_may_pull that only pulls in part of the > > list. > > > > Even with BPF being privileged, the kernel should not crash if BPF > > pulls a FRAGLIST GSO skb. > > > > But the check needs to be refined a bit. For a UDP GSO packet, I > > think gso_size is still valid, so if the head_skb length does not > > match gso_size, it has been messed with and should be dropped. > > > > For a GSO_BY_FRAGS skb, there is no single gso_size, and this pull > > may be entirely undetectable as long as frag_list != NULL? > > > > > > > > > > skb_shinfo(skb)->frag_list = NULL; > > > > > > > > > > In absense of plugging the issue in BPF, dropping here is the best > > > > > we can do indeed, I think. -- Maciej Żenczykowski, Kernel Networking Developer @ Google