Jakub Sitnicki wrote: > On Sat, May 28, 2022 at 09:54 AM +08, wangyufen wrote: > > 在 2022/5/28 5:37, Cong Wang 写道: > >> On Tue, May 24, 2022 at 03:53:11PM +0800, Wang Yufen wrote: > >>> During TCP sockmap redirect pressure test, the following warning is triggered: > >>> WARNING: CPU: 3 PID: 2145 at net/core/stream.c:205 sk_stream_kill_queues+0xbc/0xd0 > >>> CPU: 3 PID: 2145 Comm: iperf Kdump: loaded Tainted: G W 5.10.0+ #9 > >>> Call Trace: > >>> inet_csk_destroy_sock+0x55/0x110 > >>> inet_csk_listen_stop+0xbb/0x380 > >>> tcp_close+0x41b/0x480 > >>> inet_release+0x42/0x80 > >>> __sock_release+0x3d/0xa0 > >>> sock_close+0x11/0x20 > >>> __fput+0x9d/0x240 > >>> task_work_run+0x62/0x90 > >>> exit_to_user_mode_prepare+0x110/0x120 > >>> syscall_exit_to_user_mode+0x27/0x190 > >>> entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>> > >>> The reason we observed is that: > >>> When the listener is closing, a connection may have completed the three-way > >>> handshake but not accepted, and the client has sent some packets. The child > >>> sks in accept queue release by inet_child_forget()->inet_csk_destroy_sock(), > >>> but psocks of child sks have not released. > >>> > >> Hm, in this scenario, how does the child socket end up in the sockmap? > >> Clearly user-space does not have a chance to get an fd yet. > >> > >> And, how does your patch work? Since the child sock does not even inheirt > >> the sock proto after clone (see the comments above tcp_bpf_clone()) at > >> all? > >> > >> Thanks. > >> . > > My test cases are as follows: > > > > __section("sockops") > > int bpf_sockmap(struct bpf_sock_ops *skops) > > { > > switch (skops->op) { > > case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: > > case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: > > ... > > bpf_sock_hash_update(skops, &sock_ops_map, &key, BPF_NOEXIST); > > break; > > ... > > } > > Right, when processing the final ACK in tcp_rcv_state_process(), we > invoke the BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB BPF callback. > > This gives a chance to install sockmap sk_prot callbacks. > > An accept() without ever calling accept() ;-) > > [...] LGTM as well. Acked-by: John Fastabend <john.fastabend@xxxxxxxxx>