On Mon, Nov 25, 2019 at 11:38 PM CET, Martin Lau wrote: > On Sat, Nov 23, 2019 at 12:07:47PM +0100, Jakub Sitnicki wrote: > [ ... ] > >> @@ -370,6 +378,11 @@ static inline void sk_psock_restore_proto(struct sock *sk, >> sk->sk_prot = psock->sk_proto; >> psock->sk_proto = NULL; >> } >> + >> + if (psock->icsk_af_ops) { >> + icsk->icsk_af_ops = psock->icsk_af_ops; >> + psock->icsk_af_ops = NULL; >> + } >> } > > [ ... ] > >> +static struct sock *tcp_bpf_syn_recv_sock(const struct sock *sk, >> + struct sk_buff *skb, >> + struct request_sock *req, >> + struct dst_entry *dst, >> + struct request_sock *req_unhash, >> + bool *own_req) >> +{ >> + const struct inet_connection_sock_af_ops *ops; >> + void (*write_space)(struct sock *sk); >> + struct sk_psock *psock; >> + struct proto *proto; >> + struct sock *child; >> + >> + rcu_read_lock(); >> + psock = sk_psock(sk); >> + if (likely(psock)) { >> + proto = psock->sk_proto; >> + write_space = psock->saved_write_space; >> + ops = psock->icsk_af_ops; > It is not immediately clear to me what ensure > ops is not NULL here. > > It is likely I missed something. A short comment would > be very useful here. I can see the readability problem. Looking at it now, perhaps it should be rewritten, to the same effect, as: static struct sock *tcp_bpf_syn_recv_sock(...) { const struct inet_connection_sock_af_ops *ops = NULL; ... rcu_read_lock(); psock = sk_psock(sk); if (likely(psock)) { proto = psock->sk_proto; write_space = psock->saved_write_space; ops = psock->icsk_af_ops; } rcu_read_unlock(); if (!ops) ops = inet_csk(sk)->icsk_af_ops; child = ops->syn_recv_sock(sk, skb, req, dst, req_unhash, own_req); If psock->icsk_af_ops were NULL, it would mean we haven't initialized it properly. To double check what happens here: In sock_map_link we do a setup dance where we first create the psock and later initialize the socket callbacks (tcp_bpf_init). static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, struct sock *sk) { ... if (psock) { ... } else { psock = sk_psock_init(sk, map->numa_node); if (!psock) { ret = -ENOMEM; goto out_progs; } sk_psock_is_new = true; } ... if (sk_psock_is_new) { ret = tcp_bpf_init(sk); if (ret < 0) goto out_drop; } else { tcp_bpf_reinit(sk); } The "if (sk_psock_new)" branch triggers the call chain that leads to saving & overriding socket callbacks. tcp_bpf_init -> tcp_bpf_update_sk_prot -> sk_psock_update_proto Among them, icsk_af_ops. static inline void sk_psock_update_proto(...) { ... psock->icsk_af_ops = icsk->icsk_af_ops; icsk->icsk_af_ops = af_ops; } Goes without saying that a comment is needed. Thanks for the feedback, Jakub