This is an RFC for terminating sockets with intent. We have two prominent use cases in Cilium [1] where we need a way to identify and forcefully terminate a set of sockets so that they can reconnect. Cilium uses eBPF cgroup hooks for load-balancing, where it translates a service vip to one of the service backend ip addresses at socket connect time for TCP and connected UDP. Client applications are likely to be unaware of the remote containers that they are connected to getting deleted, and are left hanging when the remotes go away (long-running UDP applications, particularly). For the policy enforcement use case, users may want to enforce policies on-the-fly where they want all client applications traffic including established connections to be redirected to a subset of destinations. We evaluated following ways to identify, and forcefully terminate sockets: - The sock_destroy API added for similar Android use cases is effective in tearing down sockets. The API is behind the CONFIG_INET_DIAG_DESTROY config that's disabled by default, and currently exposed via SOCK_DIAG netlink infrastructure in userspace. The sock destroy handlers for TCP and UDP protocols send ECONNABORTED error code to sockets related to the abort state as mentioned in RFC 793. - Add unreachable routes for deleted backends. I experimented with this approach with my colleague, Nikolay Aleksandrov. We found that TCP and connected UDP sockets in the established state simply ignore the ICMP error messages, and continue to send data in the presence of such routes. My read is that applications are ignoring the ICMP errors reported on sockets [2]. - Use BPF (sockets) iterator to identify sockets connected to a deleted backend. The BPF (sockets) iterator is network namespace aware so we'll either need to enter every possible container network namespace to identify the affected connections, or adapt the iterator to be without netns checks [3]. This was discussed with my colleague Daniel Borkmann based on the feedback he shared from the LSFMMBPF conference discussions. - Use INET_DIAG infrastructure to filter and destroy sockets connected to stale backends. This approach involves first making a query to filter sockets connecting to a destination ip address/port using netlink messages with type SOCK_DIAG_BY_FAMILY, and then use the query results to make another message of type SOCK_DESTROY to actually destroy the sockets. The SOCK_DIAG infrastructure, similar to BPF iterators, is network namespace aware. We are currently leaning towards invoking the sock_destroy API directly from BPF programs. This allows us to have an effective mechanism without having to enter every possible container network namespace on a node, and rely on the CONFIG_INET_DIAG_DESTROY config with the right permissions. BPF programs attached to cgroup hooks can store client sockets connected to a backend, and invoke destroy APIs when backends are deleted. To that end, I'm in the process of adding a new BPF helper for the sock_destroy kernel function similar to the sock_diag_destroy function [4], and am soliciting early feedback on the evaluated and selected approaches. Happy to share more context. [1] https://github.com/cilium/cilium [2] https://github.com/torvalds/linux/blob/master/net/ipv4/tcp_ipv4.c#L464 [3] https://github.com/torvalds/linux/blob/master/net/ipv4/udp.c#L3011 [4] https://github.com/torvalds/linux/blob/master/net/core/sock_diag.c#L298