On Wed, Oct 13, 2021 at 7:07 AM John Fastabend <john.fastabend@xxxxxxxxx> wrote: > > Cong Wang wrote: > > From: Cong Wang <cong.wang@xxxxxxxxxxxxx> > > > > Jiang observed OOM frequently when testing our AF_UNIX/UDP > > proxy. This is due to the fact that we do not actually limit > > the socket memory before queueing skb to ingress_skb. We > > charge the skb memory later when handling the psock backlog, > > and it is not limited either. > > > > This patch adds checks for sk->sk_rcvbuf right before queuing > > to ingress_skb and drops or retries the packets if this limit > > exceeds. This is very similar to UDP receive path. Ideally we > > should set the skb owner before this check too, but it is hard > > to make TCP happy with sk_forward_alloc. > > > > For TCP, we can not just drop the packets on errors. TCP ACKs > > are already sent for those packet before reaching > > ->sk_data_ready(). Instead, we use best effort to retry, this > > works because TCP does not remove the skb from receive queue > > at that point and exceeding sk_rcvbuf limit is a temporary > > situation. > > > > Reported-by: Jiang Wang <jiang.wang@xxxxxxxxxxxxx> > > Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > > Cc: John Fastabend <john.fastabend@xxxxxxxxx> > > Cc: Lorenz Bauer <lmb@xxxxxxxxxxxxxx> > > Cc: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx> > > Signed-off-by: Cong Wang <cong.wang@xxxxxxxxxxxxx> > > Makes sense to include a fixes tag here. > > > --- > > v3: add retry logic for TCP > > v2: add READ_ONCE() > > I agree this logic is needed, but I think the below is not > complete. I can get the couple extra fixes in front of this > today/tomorrow on my side and kick it through some testing here. > Then we should push it as a series. Your patch + additions. Sounds good. As long as we have this limit, it will be okay to me. > > > > > net/core/skmsg.c | 15 +++++++++------ > > net/ipv4/tcp.c | 2 ++ > > 2 files changed, 11 insertions(+), 6 deletions(-) > > > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c > > index 2d6249b28928..356c314cd60c 100644 > > --- a/net/core/skmsg.c > > +++ b/net/core/skmsg.c > > All the skmsg changes are good. > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index e8b48df73c85..8b243fcdbb8f 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -1665,6 +1665,8 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, > > if (used <= 0) { > > if (!copied) > > copied = used; > > + if (used == -EAGAIN) > > + continue; > > This is not a good idea, looping through read_sock because we have > hit a memory limit is not going to work. If something is holding the > memlimit pinned this could loop indefinately. > > Also this will run the verdict multiple times on the same bytes. For > apply/cork logic this will break plus just basic parsers will be > confused when they see duplicate bytes. Good point! I run out of ideas for dealing with this TCP case, dropping is not okay, retrying is hard, reworking TCP ACKing is even harder. :-/ > > We need to do a workqueue and then retry later. > > Final missing piece is that strparser logic would still not handle > this correctly. > > I don't mind spending some time on this today. I'll apply your > patch and then add a few fixes for above. Ideally, we should move TCP ACK after ->sk_data_ready() so that dropping in ->sk_data_ready() would be fine, but this is certainly not easy even if it is doable. Thanks.