On Sun, Jul 04, 2021 at 09:53 PM CEST, Cong Wang wrote: > On Sat, Jul 3, 2021 at 10:52 AM Jakub Sitnicki <jakub@xxxxxxxxxxxxxx> wrote: >> When running with just the verdict prog attached, the -EIO error from >> sk_psock_verdict_apply is propagated up to tcp_read_sock. That is, it >> maps to 0 bytes used by recv_actor. sk_psock_verdict_recv in this case. >> >> tcp_read_sock, if 0 bytes were used = copied, won't sk_eat_skb. It stays >> on sk_receive_queue. > > Are you sure? > > When recv_actor() returns 0, the while loop breaks: > > 1661 used = recv_actor(desc, skb, offset, len); > 1662 if (used <= 0) { > 1663 if (!copied) > 1664 copied = used; > 1665 break; > > then it calls sk_eat_skb() a few lines after the loop: > ... > 1690 sk_eat_skb(sk, skb); This sk_eat_skb is still within the loop: 1636:int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, 1637- sk_read_actor_t recv_actor) 1638-{ … 1643- int copied = 0; … 1647- while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) { 1648- if (offset < skb->len) { … 1661- used = recv_actor(desc, skb, offset, len); 1662- if (used <= 0) { 1663- if (!copied) 1664- copied = used; 1665- break; 1666- } else if (used <= len) { 1667- seq += used; 1668- copied += used; 1669- offset += used; 1670- } … 1684- } … 1690- sk_eat_skb(sk, skb); … 1694- } … 1699- /* Clean up data we have read: This will do ACK frames. */ 1700- if (copied > 0) { 1701- tcp_recv_skb(sk, seq, &offset); 1702- tcp_cleanup_rbuf(sk, copied); 1703- } 1704- return copied; 1705-} sk_eat_skb could get called by tcp_recv_skb → sk_eat_skb if recv_actor returned > 0 (the case when we have parser attached). > >> >> sk->sk_data_ready >> sk_psock_verdict_data_ready >> ->read_sock(..., sk_psock_verdict_recv) >> tcp_read_sock (used = copied = 0) >> sk_psock_verdict_recv -> ret = 0 >> sk_psock_verdict_apply -> -EIO >> sk_psock_skb_redirect -> -EIO >> >> However, I think this gets us stuck. What if no more data gets queued, >> and sk_data_ready doesn't get called again? > > I think it is dropped by sk_eat_skb() in TCP case and of course the > sender will retransmit it after detecting this loss. So from this point of > view, there is no difference between drops due to overlimit and drops > due to eBPF program policy. I'm not sure the retransmit will happen. We update tp->rcv_nxt (tcp_rcv_nxt_update) when skb gets pushed onto sk_receive_queue in either: - tcp_rcv_established -> tcp_queue_rcv, or - tcp_rcv_established -> tcp_data_queue -> tcp_queue_rcv ... and schedule ACK (tcp_event_data_recv) to be sent. Say we are in quickack mode, then tcp_ack_snd_check()/__tcp_ack_snd_check() would cause ACK to be sent out.