On Thu, Jul 1, 2021 at 9:23 AM John Fastabend <john.fastabend@xxxxxxxxx> wrote: > > Cong Wang wrote: > > From: Cong Wang <cong.wang@xxxxxxxxxxxxx> > > > > Jiang observed OOM frequently when testing our AF_UNIX/UDP > > proxy. This is due to the fact that we do not actually limit > > the socket memory before queueing skb to ingress_skb. We > > charge the skb memory later when handling the psock backlog, > > but it is not limited either. > > Right, its not limiting but charging it should push back on > the stack so it stops feeding skbs to us. Maybe this doesn't > happen in UDP side? The OOM is due to skb queued in ingress_skb, not due to user-space consuming skb slowly. > > > > > This patch adds checks for sk->sk_rcvbuf right before queuing > > to ingress_skb and drops packets if this limit exceeds. This > > is very similar to UDP receive path. Ideally we should set the > > skb owner before this check too, but it is hard to make TCP > > happy about sk_forward_alloc. > > But it breaks TCP side see below. > > > > > Reported-by: Jiang Wang <jiang.wang@xxxxxxxxxxxxx> > > Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > > Cc: John Fastabend <john.fastabend@xxxxxxxxx> > > Cc: Lorenz Bauer <lmb@xxxxxxxxxxxxxx> > > Cc: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx> > > Signed-off-by: Cong Wang <cong.wang@xxxxxxxxxxxxx> > > --- > > net/core/skmsg.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c > > index 9b6160a191f8..a5185c781332 100644 > > --- a/net/core/skmsg.c > > +++ b/net/core/skmsg.c > > @@ -854,7 +854,8 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb) > > return -EIO; > > } > > spin_lock_bh(&psock_other->ingress_lock); > > - if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) { > > + if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED) || > > + atomic_read(&sk_other->sk_rmem_alloc) > READ_ONCE(sk_other->sk_rcvbuf)) { > > spin_unlock_bh(&psock_other->ingress_lock); > > skb_bpf_redirect_clear(skb); > > sock_drop(from->sk, skb); > > @@ -930,7 +931,8 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, > > } > > if (err < 0) { > > spin_lock_bh(&psock->ingress_lock); > > - if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) { > > + if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED) && > > + atomic_read(&sk_other->sk_rmem_alloc) <= READ_ONCE(sk_other->sk_rcvbuf)) { > > skb_queue_tail(&psock->ingress_skb, skb); > > We can't just drop the packet in the memory overrun case here. This will > break TCP because the data will be gone and no one will retransmit. > > Thats why in the current scheme on redirect we can push back when we > move it to the other queues ingress message queue or redirect into > the other socket via send. > > At one point I considered charging the data sitting in the ingress_skb? > Would that solve the problem here? I think it would cause the enqueue > at the UDP to start dropping packets from __udp_enqueue_schedule_skb()? I tried to move skb_set_owner_r() here, TCP is clearly unhappy about it, as I explained in changelog. Yes, it probably helps if we could move it here. Thanks.