Cong Wang wrote: > On Mon, May 17, 2021 at 10:36 PM John Fastabend > <john.fastabend@xxxxxxxxx> wrote: > > > > Cong Wang wrote: > > > From: Cong Wang <cong.wang@xxxxxxxxxxxxx> > > > > > > sk_psock_verdict_recv() clones the skb and uses the clone > > > afterward, so udp_read_sock() should free the original skb after > > > done using it. > > > > The clone only happens if sk_psock_verdict_recv() returns >0. > > Sure, in case of error, no one uses the original skb either, > so still need to free it. But the data is going to be dropped then. I'm questioning if this is the best we can do or not. Its simplest sure, but could we do a bit more work and peek those skbs or requeue them? Otherwise if you cross memory limits for a bit your likely to drop these unnecessarily. > > > > > > > > > Fixes: d7f571188ecf ("udp: Implement ->read_sock() for sockmap") > > > Cc: John Fastabend <john.fastabend@xxxxxxxxx> > > > Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > > > Cc: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx> > > > Cc: Lorenz Bauer <lmb@xxxxxxxxxxxxxx> > > > Signed-off-by: Cong Wang <cong.wang@xxxxxxxxxxxxx> > > > --- > > > net/ipv4/udp.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > > > index 15f5504adf5b..e31d67fd5183 100644 > > > --- a/net/ipv4/udp.c > > > +++ b/net/ipv4/udp.c > > > @@ -1798,11 +1798,13 @@ int udp_read_sock(struct sock *sk, read_descriptor_t *desc, > > > if (used <= 0) { > > > if (!copied) > > > copied = used; > > > + kfree_skb(skb); > > > > This case is different from the TCP side, if there is an error > > the sockmap side will also call kfree_skb(). In TCP side we peek > > the skb because we don't want to drop it. On UDP side this will > > just drop data on the floor. Its not super friendly, but its > > UDP so we are making the assumption this is ok? We've tried > > to remove all the drop data cases from TCP it would be nice > > to not drop data on UDP side if we can help it. Could we > > requeue or peek the UDP skb to avoid this? > > TCP is special because it supports splice() where we can > do a partial read, so it needs to peek the skb, right? UDP only > supports sockmap, where we always read a whole skb, so we > do not need to peek here? Its also about not dropping data. In TCP we should not drop data at this point in the stack so if we get an error, ENOMEM or otherwise, we need to ensure we keep the original skb. > > Thanks.