> On Mar 24, 2023, at 2:37 PM, Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > On 03/23, Aditi Ghag wrote: >> The socket destroy kfunc is used to forcefully terminate sockets from >> certain BPF contexts. We plan to use the capability in Cilium to force >> client sockets to reconnect when their remote load-balancing backends are >> deleted. The other use case is on-the-fly policy enforcement where existing >> socket connections prevented by policies need to be forcefully terminated. >> The helper allows terminating sockets that may or may not be actively >> sending traffic. > >> The helper is currently exposed to certain BPF iterators where users can >> filter, and terminate selected sockets. Additionally, the helper can only >> be called from these BPF contexts that ensure socket locking in order to >> allow synchronous execution of destroy helpers that also acquire socket >> locks. The previous commit that batches UDP sockets during iteration >> facilitated a synchronous invocation of the destroy helper from BPF context >> by skipping taking socket locks in the destroy handler. TCP iterators >> already supported batching. > >> The helper takes `sock_common` type argument, even though it expects, and >> casts them to a `sock` pointer. This enables the verifier to allow the >> sock_destroy kfunc to be called for TCP with `sock_common` and UDP with >> `sock` structs. As a comparison, BPF helpers enable this behavior with the >> `ARG_PTR_TO_BTF_ID_SOCK_COMMON` argument type. However, there is no such >> option available with the verifier logic that handles kfuncs where BTF >> types are inferred. Furthermore, as `sock_common` only has a subset of >> certain fields of `sock`, casting pointer to the latter type might not >> always be safe. Hence, the BPF kfunc converts the argument to a full sock >> before casting. > >> Signed-off-by: Aditi Ghag <aditi.ghag@xxxxxxxxxxxxx> >> --- >> net/core/filter.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++ >> net/ipv4/tcp.c | 10 ++++++--- >> net/ipv4/udp.c | 6 ++++-- >> 3 files changed, 65 insertions(+), 5 deletions(-) > >> diff --git a/net/core/filter.c b/net/core/filter.c >> index 1d6f165923bf..ba3e0dac119c 100644 >> --- a/net/core/filter.c >> +++ b/net/core/filter.c >> @@ -11621,3 +11621,57 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id) > >> return func; >> } >> + >> +/* Disables missing prototype warnings */ >> +__diag_push(); >> +__diag_ignore_all("-Wmissing-prototypes", >> + "Global functions as their definitions will be in vmlinux BTF"); >> + >> +/* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code. >> + * >> + * The helper expects a non-NULL pointer to a socket. It invokes the >> + * protocol specific socket destroy handlers. >> + * >> + * The helper can only be called from BPF contexts that have acquired the socket >> + * locks. >> + * >> + * Parameters: >> + * @sock: Pointer to socket to be destroyed >> + * >> + * Return: >> + * On error, may return EPROTONOSUPPORT, EINVAL. >> + * EPROTONOSUPPORT if protocol specific destroy handler is not implemented. >> + * 0 otherwise >> + */ >> +__bpf_kfunc int bpf_sock_destroy(struct sock_common *sock) >> +{ >> + struct sock *sk = (struct sock *)sock; >> + >> + if (!sk) >> + return -EINVAL; >> + >> + /* The locking semantics that allow for synchronous execution of the >> + * destroy handlers are only supported for TCP and UDP. >> + */ >> + if (!sk->sk_prot->diag_destroy || sk->sk_protocol == IPPROTO_RAW) >> + return -EOPNOTSUPP; > > Copy-pasting from v3, let's discuss here. > > Maybe make it more opt-in? (vs current "opt ipproto_raw out") > > if (sk->sk_prot->diag_destroy != udp_abort && > sk->sk_prot->diag_destroy != tcp_abort) > return -EOPNOTSUPP; > > Is it more robust? Or does it look uglier? ) > But maybe fine as is, I'm just thinking out loud.. Do we expect the handler to be extended for more types? Probably not... So I'll leave it as is. > >> + >> + return sk->sk_prot->diag_destroy(sk, ECONNABORTED); >> +} >> + >> +__diag_pop() >> + >> +BTF_SET8_START(sock_destroy_kfunc_set) >> +BTF_ID_FLAGS(func, bpf_sock_destroy) >> +BTF_SET8_END(sock_destroy_kfunc_set) >> + >> +static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = { >> + .owner = THIS_MODULE, >> + .set = &sock_destroy_kfunc_set, >> +}; >> + >> +static int init_subsystem(void) >> +{ >> + return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_sock_destroy_kfunc_set); >> +} >> +late_initcall(init_subsystem); >> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c >> index 33f559f491c8..5df6231016e3 100644 >> --- a/net/ipv4/tcp.c >> +++ b/net/ipv4/tcp.c >> @@ -4678,8 +4678,10 @@ int tcp_abort(struct sock *sk, int err) >> return 0; >> } > >> - /* Don't race with userspace socket closes such as tcp_close. */ >> - lock_sock(sk); >> + /* BPF context ensures sock locking. */ >> + if (!has_current_bpf_ctx()) >> + /* Don't race with userspace socket closes such as tcp_close. */ >> + lock_sock(sk); > >> if (sk->sk_state == TCP_LISTEN) { >> tcp_set_state(sk, TCP_CLOSE); >> @@ -4701,9 +4703,11 @@ int tcp_abort(struct sock *sk, int err) >> } > >> bh_unlock_sock(sk); >> + >> local_bh_enable(); >> tcp_write_queue_purge(sk); >> - release_sock(sk); >> + if (!has_current_bpf_ctx()) >> + release_sock(sk); >> return 0; >> } >> EXPORT_SYMBOL_GPL(tcp_abort); >> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c >> index 58c620243e47..408836102e20 100644 >> --- a/net/ipv4/udp.c >> +++ b/net/ipv4/udp.c >> @@ -2925,7 +2925,8 @@ EXPORT_SYMBOL(udp_poll); > >> int udp_abort(struct sock *sk, int err) >> { >> - lock_sock(sk); >> + if (!has_current_bpf_ctx()) >> + lock_sock(sk); > >> /* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing >> * with close() >> @@ -2938,7 +2939,8 @@ int udp_abort(struct sock *sk, int err) >> __udp_disconnect(sk, 0); > >> out: >> - release_sock(sk); >> + if (!has_current_bpf_ctx()) >> + release_sock(sk); > >> return 0; >> } >> -- >> 2.34.1