> -----Original Message----- > From: Cong Wang [mailto:xiyou.wangcong@xxxxxxxxx] > Sent: Monday, September 26, 2022 2:26 AM > To: liujian (CE) <liujian56@xxxxxxxxxx> > Cc: John Fastabend <john.fastabend@xxxxxxxxx>; Jakub Sitnicki > <jakub@xxxxxxxxxxxxxx>; Eric Dumazet <edumazet@xxxxxxxxxx>; davem > <davem@xxxxxxxxxxxxx>; yoshfuji@xxxxxxxxxxxxxx; dsahern@xxxxxxxxxx; > Jakub Kicinski <kuba@xxxxxxxxxx>; Paolo Abeni <pabeni@xxxxxxxxxx>; > netdev <netdev@xxxxxxxxxxxxxxx>; bpf@xxxxxxxxxxxxxxx > Subject: Re: [bug report] one possible out-of-order issue in sockmap > > On Sat, Sep 24, 2022 at 07:59:15AM +0000, liujian (CE) wrote: > > Hello, > > > > I had a scp failure problem here. I analyze the code, and the reasons may > be as follows: > > > > From commit e7a5f1f1cd00 ("bpf/sockmap: Read psock ingress_msg > before > > sk_receive_queue", if we use sockops > > (BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB > > and BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB) to enable socket's > sockmap > > function, and don't enable strparse and verdict function, the > > out-of-order problem may occur in the following process. > > > > client SK server SK > > ---------------------------------------------------------------------- > > ---- > > tcp_rcv_synsent_state_process > > tcp_finish_connect > > tcp_init_transfer > > tcp_set_state(sk, TCP_ESTABLISHED); > > // insert SK to sockmap > > wake up waitter > > tcp_send_ack > > > > tcp_bpf_sendmsg(msgA) > > // msgA will go tcp stack > > tcp_rcv_state_process > > tcp_init_transfer > > //insert SK to sockmap > > tcp_set_state(sk, > > TCP_ESTABLISHED) > > wake up waitter > > Here after the socket is inserted to a sockmap, its ->sk_data_ready() is > already replaced with sk_psock_verdict_data_ready(), so msgA should go to > sockmap, not TCP stack? > It is TCP stack. Here I only enable BPF_SK_MSG_VERDICT type. bpftool prog load bpf_redir.o /sys/fs/bpf/bpf_redir map name sock_ops_map pinned /sys/fs/bpf/sock_ops_map bpftool prog attach pinned /sys/fs/bpf/bpf_redir msg_verdict pinned /sys/fs/bpf/sock_ops_map The call trace like this: Tcp_bpf_sendmsg --tcp_bpf_send_verdict ---- sk_psock_msg_verdict // did not find serverSK, return __SK_PASS ---- tcp_bpf_push ------ do_tcp_sendpages // go to TCP stack After this, serverSk is inserted to a sockmap, but msgA is already running the TCP stack. > > tcp_bpf_sendmsg(msgB) > > // msgB go sockmap > > tcp_bpf_recvmsg > > //msgB, out-of-order > > tcp_bpf_recvmsg > > //msgA, out-of-order > > > > > > Even if msgA arrives earlier than msgB (in most cases), tcp_bpf_recvmsg > receives msg from the psock queue first. > > The worst case is that msgA waits for serverSK to change to > TCP_ESTABLISHED in the protocol stack. msgA may arrive at the serverSK > receive queue later than msgB. > > If msgA befor than msgB, > > > > If the ACK packets of the three-way TCP handshake are dropped for a > period of time, the OOO problem is easily reproduced. > > > > iptables -A INPUT -p tcp -m tcp --dport 5006 --tcp-flags > > SYN,RST,ACK,FIN ACK -j DROP ... > > iptables -D INPUT -p tcp -m tcp --dport 5006 --tcp-flags > > SYN,RST,ACK,FIN ACK -j DROP > > > > Best Wishes > > Liu Jian