wangyufen wrote: > > 在 2022/3/16 0:25, Daniel Borkmann 写道: > > On 3/15/22 1:12 PM, Jakub Sitnicki wrote: > >> On Tue, Mar 15, 2022 at 03:24 PM +08, wangyufen wrote: > >>> 在 2022/3/14 23:30, Jakub Sitnicki 写道: > >>>> On Mon, Mar 14, 2022 at 08:44 PM +08, Wang Yufen wrote: > >>>>> A tcp socket in a sockmap. If user invokes bpf_map_delete_elem to > >>>>> delete > >>>>> the sockmap element, the tcp socket will switch to use the TCP > >>>>> protocol > >>>>> stack to send and receive packets. The switching process may cause > >>>>> some > >>>>> issues, such as if some msgs exist in the ingress queue and are > >>>>> cleared > >>>>> by sk_psock_drop(), the packets are lost, and the tcp data is > >>>>> abnormal. > >>>>> > >>>>> Signed-off-by: Wang Yufen <wangyufen@xxxxxxxxxx> > >>>>> --- > >>>> Can you please tell us a bit more about the life-cycle of the > >>>> socket in > >>>> your workload? Questions that come to mind: > >>>> > >>>> 1) What triggers the removal of the socket from sockmap in your case? > >>> We use sk_msg to redirect with sock hash, like this: > >>> > >>> skA redirect skB > >>> Tx <-----------> skB,Rx > >>> > >>> And construct a scenario where the packet sending speed is high, the > >>> packet receiving speed is slow, so the packets are stacked in the > >>> ingress > >>> queue on the receiving side. In this case, if run > >>> bpf_map_delete_elem() to > >>> delete the sockmap entry, will trigger the following procedure: > >>> > >>> sock_hash_delete_elem() > >>> sock_map_unref() > >>> sk_psock_put() > >>> sk_psock_drop() > >>> sk_psock_stop() > >>> __sk_psock_zap_ingress() > >>> __sk_psock_purge_ingress_msg() > >>> > >>>> 2) Would it still be a problem if removal from sockmap did not > >>>> cause any > >>>> packets to get dropped? > >>> Yes, it still be a problem. If removal from sockmap did not cause any > >>> packets to get dropped, packet receiving process switches to use TCP > >>> protocol stack. The packets in the psock ingress queue cannot be > >>> received > >>> > >>> by the user. > >> > >> Thanks for the context. So, if I understand correctly, you want to avoid > >> breaking the network pipe by updating the sockmap from user-space. > >> > >> This sounds awfully similar to BPF_MAP_FREEZE. Have you considered that? > > > > +1 > > > > Aside from that, the patch as-is also fails BPF CI in a lot of places, > > please > > make sure to check selftests: > > > > https://github.com/kernel-patches/bpf/runs/5537367301?check_suite_focus=true > > > > > > [...] > > #145/73 sockmap_listen/sockmap IPv6 test_udp_redir:OK > > #145/74 sockmap_listen/sockmap IPv6 test_udp_unix_redir:OK > > #145/75 sockmap_listen/sockmap Unix test_unix_redir:OK > > #145/76 sockmap_listen/sockmap Unix test_unix_redir:OK > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > #145/77 sockmap_listen/sockhash IPv4 TCP test_insert_invalid:FAIL > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > #145/78 sockmap_listen/sockhash IPv4 TCP test_insert_opened:FAIL > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > #145/79 sockmap_listen/sockhash IPv4 TCP test_insert_bound:FAIL > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > [...] > > > > Thanks, > > Daniel > > . > > I'm not sure about this patch. The main purpose is to point out the > possible problems > > when the socket is deleted from the map.I'm sorry for the trouble. > > Thanks. If you want to delete a socket you should flush it first. To do this stop redirecting traffic to it and then read all the data out. At the moment its a bit tricky to know when the recieving socket is empty though. Adding a flag on delete to only delete when the ingress qlen == 0 might be a possibility if you need delete to work and are trying to work out how to safely delete sockets.