Cong Wang wrote: > From: Cong Wang <cong.wang@xxxxxxxxxxxxx> > > This patch inroduces tcp_read_skb() based on tcp_read_sock(), > a preparation for the next patch which actually introduces > a new sock ops. > > TCP is special here, because it has tcp_read_sock() which is > mainly used by splice(). tcp_read_sock() supports partial read > and arbitrary offset, neither of them is needed for sockmap. > > Cc: Eric Dumazet <edumazet@xxxxxxxxxx> > Cc: John Fastabend <john.fastabend@xxxxxxxxx> > Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > Cc: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx> > Signed-off-by: Cong Wang <cong.wang@xxxxxxxxxxxxx> > --- > include/net/tcp.h | 2 ++ > net/ipv4/tcp.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 49 insertions(+) > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index 1e99f5c61f84..878544d0f8f9 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -669,6 +669,8 @@ void tcp_get_info(struct sock *, struct tcp_info *); > /* Read 'sendfile()'-style from a TCP socket */ > int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, > sk_read_actor_t recv_actor); > +int tcp_read_skb(struct sock *sk, read_descriptor_t *desc, > + sk_read_actor_t recv_actor); > > void tcp_initialize_rcv_mss(struct sock *sk); > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 9984d23a7f3e..a18e9ababf54 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -1709,6 +1709,53 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, > } > EXPORT_SYMBOL(tcp_read_sock); > > +int tcp_read_skb(struct sock *sk, read_descriptor_t *desc, > + sk_read_actor_t recv_actor) > +{ > + struct tcp_sock *tp = tcp_sk(sk); > + u32 seq = tp->copied_seq; > + struct sk_buff *skb; > + int copied = 0; > + u32 offset; > + > + if (sk->sk_state == TCP_LISTEN) > + return -ENOTCONN; > + > + while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) { > + int used; > + > + __skb_unlink(skb, &sk->sk_receive_queue); > + used = recv_actor(desc, skb, 0, skb->len); > + if (used <= 0) { > + if (!copied) > + copied = used; > + break; > + } > + seq += used; > + copied += used; > + > + if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) { > + kfree_skb(skb); Hi Cong, can you elaborate here from v2 comment. "Hm, it is tricky here, we use the skb refcount after this patchset, so it could be a real drop from another kfree_skb() in net/core/skmsg.c which initiates the drop." The tcp_read_sock() hook is using tcp_eat_recv_skb(). Are we going to kick tracing infra even on good cases with kfree_skb()? In sk_psock_verdict_recv() we do an skb_clone() there. .John