On Tue, Mar 15, 2022 at 01:12:08PM +0100, Jakub Sitnicki wrote: > On Tue, Mar 15, 2022 at 03:24 PM +08, wangyufen wrote: > > 在 2022/3/14 23:30, Jakub Sitnicki 写道: > >> On Mon, Mar 14, 2022 at 08:44 PM +08, Wang Yufen wrote: > >>> A tcp socket in a sockmap. If user invokes bpf_map_delete_elem to delete > >>> the sockmap element, the tcp socket will switch to use the TCP protocol > >>> stack to send and receive packets. The switching process may cause some > >>> issues, such as if some msgs exist in the ingress queue and are cleared > >>> by sk_psock_drop(), the packets are lost, and the tcp data is abnormal. > >>> > >>> Signed-off-by: Wang Yufen <wangyufen@xxxxxxxxxx> > >>> --- > >> Can you please tell us a bit more about the life-cycle of the socket in > >> your workload? Questions that come to mind: > >> > >> 1) What triggers the removal of the socket from sockmap in your case? > > We use sk_msg to redirect with sock hash, like this: > > > > skA redirect skB > > Tx <-----------> skB,Rx > > > > And construct a scenario where the packet sending speed is high, the > > packet receiving speed is slow, so the packets are stacked in the ingress > > queue on the receiving side. In this case, if run bpf_map_delete_elem() to > > delete the sockmap entry, will trigger the following procedure: > > > > sock_hash_delete_elem() > > sock_map_unref() > > sk_psock_put() > > sk_psock_drop() > > sk_psock_stop() > > __sk_psock_zap_ingress() > > __sk_psock_purge_ingress_msg() > > > >> 2) Would it still be a problem if removal from sockmap did not cause any > >> packets to get dropped? > > Yes, it still be a problem. If removal from sockmap did not cause any > > packets to get dropped, packet receiving process switches to use TCP > > protocol stack. The packets in the psock ingress queue cannot be received > > > > by the user. > > Thanks for the context. So, if I understand correctly, you want to avoid > breaking the network pipe by updating the sockmap from user-space. > > This sounds awfully similar to BPF_MAP_FREEZE. Have you considered that? Doesn't BPF_MAP_FREEZE only freeze write operations from syscalls? For sockmap, receiving packets is not a part of map write operation. The problem here is that skmsg can only be consumed when the socket is still in the map, as it uses a separate queue and a separate type of message (skmsg vs. skb). So, esstentially this behavior is by design. Thanks.