Jiayuan Chen wrote: > On Thu, Dec 12, 2024 at 05:36:15PM -0800, John Fastabend wrote: > [...] > > > > I think easier is to do similar logic to read_sock and track > > > > offset and len? Did I miss something. > > > > > > Thanks to John Fastabend. > > > > > > Let me explain it. > > > Now I only replace the read_sock handler when using strparser. > > > > > > My previous implementation added the replacement of read_sock in > > > sk_psock_start_strp() to more explicitly require replacement when using > > > strparser, for instance: > > > '''skmsg.c > > > void sk_psock_start_strp(struct sock *sk, struct sk_psock *psock) > > > { > > > ... > > > sk->sk_data_ready = sk_psock_strp_data_ready; > > > /* Replacement */ > > > sk->sk_socket->ops->read_sock = tcp_bpf_read_sock; > > > } > > > ''' > > > > > > As you can see that it only works for strparser. > > > (The current implementation of replacement in tcp_bpf_update_proto() > > > achieves the same effect just not as obviously.) > > > > > > So the current implementation of recv_actor() can only be strp_recv(), > > > with the call stack as follows: > > > ''' > > > sk_psock_strp_data_ready > > > -> strp_data_ready > > > -> strp_read_sock > > > -> strp_recv > > > ''' > > > > > > The implementation of strp_recv() will consume all input skb. Even if it > > > reads part of the data, it will clone it, then place it into its own > > > queue, expecting the caller to release the skb. I didn't find any > > > logic like tcp_try_coalesce() (fix me if i miss something). > > > > > > So here is what I believe the flow is, > > > > sk_psock_strp_data_ready > > -> str_data_ready > > -> strp_read_sock > > -> sock->ops->read_sock(.., strp_recv) > > > > > > We both have the same idea up to here. But then the proposed data_ready() > > call > > > > + while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) { > > + u8 tcp_flags; > > + int used; > > + > > + WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk)); > > + tcp_flags = TCP_SKB_CB(skb)->tcp_flags; > > + used = recv_actor(desc, skb, 0, skb->len); > > > > The recv_actor here is strp_recv() all good so far. But, because > > that skb is still on the sk_receive_queue() the TCP stack may > > at the same time do > > > > tcp_data_queue > > -> tcp_queue_rcv > > -> tail = skb_peek_tail(&sk->sk_receive_queue); > > tcp_try_coalesce(sk, tail, skb, fragstolen) > > -> skb_try_coalesce() > > ... skb->len += len > > > > So among other things you will have changed the skb->len and added some > > data to it. If this happens while you are running the recv actor we will > > eat the data by calling tcp_eat_recv_skb(). Any data added from the > > try_coalesce will just be dropped and never handled? The clone() from > > the strparser side doesn't help you the tcp_eat_recv_skb call will > > unlik the skb from the sk_receive_queue. > > > > I don't think you have any way to protect this at the moment. > > Thanks John Fastabend. > > It seems sk was always locked whenever data_ready called. > > ''' > bh_lock_sock_nested(sk) > tcp_v4_do_rcv(sk) > | > |-> tcp_rcv_established > |-> tcp_queue_rcv > |-> tcp_try_coalesce > | > |-> tcp_rcv_state_process > |-> tcp_queue_rcv > |-> tcp_try_coalesce > | > |-> sk->sk_data_ready() > > bh_unlock_sock(sk) > ''' > > other data_ready path: > ''' > lock_sk(sk) > sk->sk_data_ready() > release_sock(sk) > ''' > > I can not find any concurrency there. OK thanks, one more concern though. What if strp_recv thorws an ENOMEM error on the clone? Would we just drop the data then? This is problem not the expected behavior its already been ACKed. Thanks, John