On Tue, Jul 26, 2022 at 6:49 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > On Tue, Jul 26, 2022 at 6:47 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > > > On Tue, Jul 26, 2022 at 6:14 PM Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote: > > > > > If TCP really wants to queue a FIN with skb->len==0, then we have to > > > adjust the return value for recv_actor(), because we currently use 0 as > > > an error too (meaning no data is consumed): > > > > > > if (sk_psock_verdict_apply(psock, skb, ret) < 0) > > > len = 0; // here! > > > out: > > > rcu_read_unlock(); > > > return len; > > > > > > > > > BTW, what is wrong if we simply drop it before queueing to > > > sk_receive_queue in TCP? Is it there just for collapse? > > > > Because an incoming segment can have payload and FIN. > > > > The consumer will need to consume the payload before FIN is considered/consumed, > > with the complication of MSG_PEEK ... > > > > Right after tcp_read_skb() removes the skb from sk_receive_queue, > > we need to update TCP state, regardless of recv_actor(). > > > > Maybe like that: > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index ba2bdc81137490bd1748cde95789f8d2bff3ab0f..6e2c11cd921872e406baffc475c9870e147578a1 > > 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -1759,20 +1759,15 @@ int tcp_read_skb(struct sock *sk, > > skb_read_actor_t recv_actor) > > int used; > > > > __skb_unlink(skb, &sk->sk_receive_queue); > > + seq = TCP_SKB_CB(skb)->end_seq; > > used = recv_actor(sk, skb); > > if (used <= 0) { > > if (!copied) > > copied = used; > > break; > > } > > - seq += used; > > copied += used; > > > > - if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) { > > - consume_skb(skb); > > - ++seq; > > - break; > > - } > > consume_skb(skb); > > break; > > } > > Or even better: > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index ba2bdc81137490bd1748cde95789f8d2bff3ab0f..66c187a2592c042565211565adb3f40a811dfd7d > 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -1759,21 +1759,15 @@ int tcp_read_skb(struct sock *sk, > skb_read_actor_t recv_actor) > int used; > > __skb_unlink(skb, &sk->sk_receive_queue); > + seq = TCP_SKB_CB(skb)->end_seq; > used = recv_actor(sk, skb); > + consume_skb(skb); > if (used <= 0) { > if (!copied) > copied = used; > break; > } > - seq += used; > copied += used; > - > - if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) { > - consume_skb(skb); > - ++seq; > - break; > - } > - consume_skb(skb); > break; > } > WRITE_ONCE(tp->copied_seq, seq); Note that this code will still not behave properly if we have in receive queues two skbs of 1000 bytes of payload like: seq 1:1001 seq 501:1501 tcp_recvmsg() would copy 1000 bytes from the first skb, then 500 bytes from second skb.