> On Feb 27, 2023, at 6:56 AM, Aditi Ghag <aditi.ghag@xxxxxxxxxxxxx> wrote: > > > >> On Feb 24, 2023, at 2:35 PM, Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: >> >> On 02/23, Aditi Ghag wrote: >>> The socket destroy kfunc is used to forcefully terminate sockets from >>> certain BPF contexts. We plan to use the capability in Cilium to force >>> client sockets to reconnect when their remote load-balancing backends are >>> deleted. The other use case is on-the-fly policy enforcement where existing >>> socket connections prevented by policies need to be forcefully terminated. >>> The helper allows terminating sockets that may or may not be actively >>> sending traffic. >> >>> The helper is currently exposed to certain BPF iterators where users can >>> filter, and terminate selected sockets. Additionally, the helper can only >>> be called from these BPF contexts that ensure socket locking in order to >>> allow synchronous execution of destroy helpers that also acquire socket >>> locks. The previous commit that batches UDP sockets during iteration >>> facilitated a synchronous invocation of the destroy helper from BPF context >>> by skipping taking socket locks in the destroy handler. TCP iterators >>> already supported batching. >> >>> The helper takes `sock_common` type argument, even though it expects, and >>> casts them to a `sock` pointer. This enables the verifier to allow the >>> sock_destroy kfunc to be called for TCP with `sock_common` and UDP with >>> `sock` structs. As a comparison, BPF helpers enable this behavior with the >>> `ARG_PTR_TO_BTF_ID_SOCK_COMMON` argument type. However, there is no such >>> option available with the verifier logic that handles kfuncs where BTF >>> types are inferred. Furthermore, as `sock_common` only has a subset of >>> certain fields of `sock`, casting pointer to the latter type might not >>> always be safe. Hence, the BPF kfunc converts the argument to a full sock >>> before casting. >> >>> Signed-off-by: Aditi Ghag <aditi.ghag@xxxxxxxxxxxxx> >>> --- >>> net/core/filter.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++ >>> net/ipv4/tcp.c | 17 ++++++++++----- >>> net/ipv4/udp.c | 7 ++++-- >>> 3 files changed, 72 insertions(+), 7 deletions(-) >> >>> diff --git a/net/core/filter.c b/net/core/filter.c >>> index 1d6f165923bf..79cd91ba13d0 100644 >>> --- a/net/core/filter.c >>> +++ b/net/core/filter.c >>> @@ -11621,3 +11621,58 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id) >> >>> return func; >>> } >>> + >>> +/* Disables missing prototype warnings */ >>> +__diag_push(); >>> +__diag_ignore_all("-Wmissing-prototypes", >>> + "Global functions as their definitions will be in vmlinux BTF"); >>> + >>> +/* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code. >>> + * >>> + * The helper expects a non-NULL pointer to a full socket. It invokes >>> + * the protocol specific socket destroy handlers. >>> + * >>> + * The helper can only be called from BPF contexts that have acquired the socket >>> + * locks. >>> + * >>> + * Parameters: >>> + * @sock: Pointer to socket to be destroyed >>> + * >>> + * Return: >>> + * On error, may return EPROTONOSUPPORT, EINVAL. >>> + * EPROTONOSUPPORT if protocol specific destroy handler is not implemented. >>> + * 0 otherwise >>> + */ >>> +int bpf_sock_destroy(struct sock_common *sock) >> >> Prefix with __bpf_kfunc (see other kfuncs). > > Will do! > >> >>> +{ >>> + /* Validates the socket can be type casted to a full socket. */ >>> + struct sock *sk = sk_to_full_sk((struct sock *)sock); >>> + >>> + if (!sk) >>> + return -EINVAL; >>> + >>> + /* The locking semantics that allow for synchronous execution of the >>> + * destroy handlers are only supported for TCP and UDP. >>> + */ >>> + if (!sk->sk_prot->diag_destroy || sk->sk_protocol == IPPROTO_RAW) >>> + return -EOPNOTSUPP; >>> + >>> + return sk->sk_prot->diag_destroy(sk, ECONNABORTED); >>> +} >>> + >>> +__diag_pop() >>> + >>> +BTF_SET8_START(sock_destroy_kfunc_set) >>> +BTF_ID_FLAGS(func, bpf_sock_destroy) >>> +BTF_SET8_END(sock_destroy_kfunc_set) >>> + >>> +static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = { >>> + .owner = THIS_MODULE, >>> + .set = &sock_destroy_kfunc_set, >>> +}; >>> + >>> +static int init_subsystem(void) >>> +{ >>> + return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_sock_destroy_kfunc_set); >> >> Is it safe? Does it mean I can call bpf_sock_destroy from any tracing >> program from anywhere? What if the socket is not locked? >> IOW, do we have to constrain it to the iterator programs (at least for >> now)? > > Given kprobes are not considered as part of BPF_PROG_TYPE_TRACING, I'm not sure if there are other tracing programs with sock/sock_common arguments. Regardless, this is a valid point. I had brought up a similar topic earlier during the v1 discussion - https://lore.kernel.org/bpf/78E434B0-06A9-466F-A061-B9A05DC6DE6D@xxxxxxxxxxxxx/. I suppose you would have a similar problem in the case of setsockopt* helpers. > Is the general topic of limiting access for kfunc to a subset of BPF_PROG_* programs being discussed? I've submitted this as a potential topic for the lsm/mm/bpf agenda. Also, related to this discussion: could we have something similar to lockdep_sock_is_held with less overhead? > >> >>> +} >>> +late_initcall(init_subsystem); >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c >>> index 33f559f491c8..8123c264d8ea 100644 >>> --- a/net/ipv4/tcp.c >>> +++ b/net/ipv4/tcp.c >>> @@ -4678,8 +4678,10 @@ int tcp_abort(struct sock *sk, int err) >>> return 0; >>> } >> >>> - /* Don't race with userspace socket closes such as tcp_close. */ >>> - lock_sock(sk); >>> + /* BPF context ensures sock locking. */ >>> + if (!has_current_bpf_ctx()) >>> + /* Don't race with userspace socket closes such as tcp_close. */ >>> + lock_sock(sk); >> >>> if (sk->sk_state == TCP_LISTEN) { >>> tcp_set_state(sk, TCP_CLOSE); >>> @@ -4688,7 +4690,9 @@ int tcp_abort(struct sock *sk, int err) >> >>> /* Don't race with BH socket closes such as inet_csk_listen_stop. */ >>> local_bh_disable(); >>> - bh_lock_sock(sk); >>> + if (!has_current_bpf_ctx()) >>> + bh_lock_sock(sk); >>> + >> >>> if (!sock_flag(sk, SOCK_DEAD)) { >>> sk->sk_err = err; >>> @@ -4700,10 +4704,13 @@ int tcp_abort(struct sock *sk, int err) >>> tcp_done(sk); >>> } >> >>> - bh_unlock_sock(sk); >>> + if (!has_current_bpf_ctx()) >>> + bh_unlock_sock(sk); >>> + >>> local_bh_enable(); >>> tcp_write_queue_purge(sk); >>> - release_sock(sk); >>> + if (!has_current_bpf_ctx()) >>> + release_sock(sk); >>> return 0; >>> } >>> EXPORT_SYMBOL_GPL(tcp_abort); >>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c >>> index 2f3978de45f2..1bc9ad92c3d4 100644 >>> --- a/net/ipv4/udp.c >>> +++ b/net/ipv4/udp.c >>> @@ -2925,7 +2925,9 @@ EXPORT_SYMBOL(udp_poll); >> >>> int udp_abort(struct sock *sk, int err) >>> { >>> - lock_sock(sk); >>> + /* BPF context ensures sock locking. */ >>> + if (!has_current_bpf_ctx()) >>> + lock_sock(sk); >> >>> /* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing >>> * with close() >>> @@ -2938,7 +2940,8 @@ int udp_abort(struct sock *sk, int err) >>> __udp_disconnect(sk, 0); >> >>> out: >>> - release_sock(sk); >>> + if (!has_current_bpf_ctx()) >>> + release_sock(sk); >> >>> return 0; >>> } >>> -- >>> 2.34.1