This patch adds the capability to destroy sockets in BPF. We plan to use the capability in Cilium to force client sockets to reconnect when their remote load-balancing backends are deleted. The other use case is on-the-fly policy enforcement where existing socket connections prevented by policies need to be terminated. The use cases, and more details around the selected approach was presented at LPC 2022 - https://lpc.events/event/16/contributions/1358/. RFC discussion - https://lore.kernel.org/netdev/CABG=zsBEh-P4NXk23eBJw7eajB5YJeRS7oPXnTAzs=yob4EMoQ@xxxxxxxxxxxxxx/T/#u. v1 patch series - https://lore.kernel.org/bpf/cover.1671242108.git.aditi.ghag@xxxxxxxxxxxxx/ v2 highlights: - Implemented batching support for UDP iterator. - Converted bpf_sock_destroy helper to kfunc. - Synchronous execution of destroy handlers to replace the previous workqueue implementation. - Updated selftests to use the kfunc. Notes to the reviewers (further details in commits description): - I hit a snag while writing the kfunc where verifier complained about the `sock_common` type passed from TCP iterator. With kfuncs, there don't seem to be any options available to pass BTF type hints to the verifier (equivalent of `ARG_PTR_TO_BTF_ID_SOCK_COMMON`, as was the case with the helper). As a result, I changed the argument type of the sock_destory kfunc to `sock_common`. Discussed it from the point of view of the verifier with my colleague (Dylan Reimerink): the verifier has a `sock_common` BTF ID for a subset of socket types. However, it may not always be safe to cast from `sock_common *` to 'sock *', so I added a check for full sock availability in the kfunc. - The `vmlinux.h` import in the selftest prog unexpectedly led to libbpf failing to load the program. As it turns out, the libbpf kfunc related code doesn't seem to handle BTF `FWD` type for structs. I've attached debug information about the issue in case the loader logic can accommodate such gotchas. Although the error in this case was specific to the test imports. - We previously discussed the possibility of using sockmap to store sockets to be destroyed as an optimization, so that users may not need to iterate over all the host-wide sockets. This approach needs more discussion on the TCP side, as we may need to extend the logic that checks for certain TCP states while inserting sockets in a sockmap. So I've skipped those self test cases involving sockmap from the patch. (same as v1 patch) Aditi Ghag (3): bpf: Implement batching in UDP iterator bpf: Add bpf_sock_destroy kfunc selftests/bpf: Add tests for bpf_sock_destroy net/core/filter.c | 55 +++++ net/ipv4/tcp.c | 17 +- net/ipv4/udp.c | 231 +++++++++++++++++- .../selftests/bpf/prog_tests/sock_destroy.c | 125 ++++++++++ .../selftests/bpf/progs/sock_destroy_prog.c | 110 +++++++++ 5 files changed, 522 insertions(+), 16 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/sock_destroy.c create mode 100644 tools/testing/selftests/bpf/progs/sock_destroy_prog.c -- 2.34.1