On Fri, Oct 09, 2020 at 08:37 PM CEST, John Fastabend wrote: > In the sk_skb redirect case we didn't handle the case where we overrun > the sk_rmem_alloc entry on ingress redirect or sk_wmem_alloc on egress. > Because we didn't have anything implemented we simply dropped the skb. > This meant data could be dropped if socket memory accounting was in > place. > > This fixes the above dropped data case by moving the memory checks > later in the code where we actually do the send or recv. This pushes > those checks into the workqueue and allows us to return an EAGAIN error > which in turn allows us to try again later from the workqueue. > > Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") > Signed-off-by: John Fastabend <john.fastabend@xxxxxxxxx> > --- > net/core/skmsg.c | 28 ++++++++++++++-------------- > 1 file changed, 14 insertions(+), 14 deletions(-) > [...] > @@ -709,30 +711,28 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) > { > struct sk_psock *psock_other; > struct sock *sk_other; > - bool ingress; > > sk_other = tcp_skb_bpf_redirect_fetch(skb); > + /* This error is a buggy BPF program, it returned a redirect > + * return code, but then didn't set a redirect interface. > + */ > if (unlikely(!sk_other)) { > kfree_skb(skb); > return; > } > psock_other = sk_psock(sk_other); > + /* This error indicates the socket is being torn down or had another > + * error that caused the pipe to break. We can't send a packet on > + * a socket that is in this state so we drop the skb. > + */ > if (!psock_other || sock_flag(sk_other, SOCK_DEAD) || > !sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) { > kfree_skb(skb); > return; > } > > - ingress = tcp_skb_bpf_ingress(skb); > - if ((!ingress && sock_writeable(sk_other)) || > - (ingress && > - atomic_read(&sk_other->sk_rmem_alloc) <= > - sk_other->sk_rcvbuf)) { I'm wondering why the check for going over socket's rcvbuf was removed? I see that we now rely exclusively on sk_psock_skb_ingress→sk_rmem_schedule for sk_rmem_alloc checks, which I don't think applies the rcvbuf limit. > - skb_queue_tail(&psock_other->ingress_skb, skb); > - schedule_work(&psock_other->work); > - } else { > - kfree_skb(skb); > - } > + skb_queue_tail(&psock_other->ingress_skb, skb); > + schedule_work(&psock_other->work); > } > > static void sk_psock_tls_verdict_apply(struct sk_buff *skb, int verdict)