On Tue, Nov 26, 2019 at 04:54:33PM +0100, Jakub Sitnicki wrote: > On Mon, Nov 25, 2019 at 11:38 PM CET, Martin Lau wrote: > > On Sat, Nov 23, 2019 at 12:07:47PM +0100, Jakub Sitnicki wrote: > > [ ... ] > > > >> @@ -370,6 +378,11 @@ static inline void sk_psock_restore_proto(struct sock *sk, > >> sk->sk_prot = psock->sk_proto; > >> psock->sk_proto = NULL; > >> } > >> + > >> + if (psock->icsk_af_ops) { > >> + icsk->icsk_af_ops = psock->icsk_af_ops; > >> + psock->icsk_af_ops = NULL; > >> + } > >> } > > > > [ ... ] > > > >> +static struct sock *tcp_bpf_syn_recv_sock(const struct sock *sk, > >> + struct sk_buff *skb, > >> + struct request_sock *req, > >> + struct dst_entry *dst, > >> + struct request_sock *req_unhash, > >> + bool *own_req) > >> +{ > >> + const struct inet_connection_sock_af_ops *ops; > >> + void (*write_space)(struct sock *sk); > >> + struct sk_psock *psock; > >> + struct proto *proto; > >> + struct sock *child; > >> + > >> + rcu_read_lock(); > >> + psock = sk_psock(sk); > >> + if (likely(psock)) { > >> + proto = psock->sk_proto; > >> + write_space = psock->saved_write_space; > >> + ops = psock->icsk_af_ops; > > It is not immediately clear to me what ensure > > ops is not NULL here. > > > > It is likely I missed something. A short comment would > > be very useful here. > > I can see the readability problem. Looking at it now, perhaps it should > be rewritten, to the same effect, as: > > static struct sock *tcp_bpf_syn_recv_sock(...) > { > const struct inet_connection_sock_af_ops *ops = NULL; > ... > > rcu_read_lock(); > psock = sk_psock(sk); > if (likely(psock)) { > proto = psock->sk_proto; > write_space = psock->saved_write_space; > ops = psock->icsk_af_ops; > } > rcu_read_unlock(); > > if (!ops) > ops = inet_csk(sk)->icsk_af_ops; > child = ops->syn_recv_sock(sk, skb, req, dst, req_unhash, own_req); > > If psock->icsk_af_ops were NULL, it would mean we haven't initialized it > properly. To double check what happens here: I did not mean the init path. The init path is fine since it init eveything on psock before publishing the sk to the sock_map. I was thinking the delete path (e.g. sock_map_delete_elem). It is not clear to me what prevent the earlier pasted sk_psock_restore_proto() which sets psock->icsk_af_ops to NULL from running in parallel with tcp_bpf_syn_recv_sock()? An explanation would be useful. > > In sock_map_link we do a setup dance where we first create the psock and > later initialize the socket callbacks (tcp_bpf_init). > > static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, > struct sock *sk) > { > ... > if (psock) { > ... > } else { > psock = sk_psock_init(sk, map->numa_node); > if (!psock) { > ret = -ENOMEM; > goto out_progs; > } > sk_psock_is_new = true; > } > ... > if (sk_psock_is_new) { > ret = tcp_bpf_init(sk); > if (ret < 0) > goto out_drop; > } else { > tcp_bpf_reinit(sk); > } > > The "if (sk_psock_new)" branch triggers the call chain that leads to > saving & overriding socket callbacks. > > tcp_bpf_init -> tcp_bpf_update_sk_prot -> sk_psock_update_proto > > Among them, icsk_af_ops. > > static inline void sk_psock_update_proto(...) > { > ... > psock->icsk_af_ops = icsk->icsk_af_ops; > icsk->icsk_af_ops = af_ops; > } > > Goes without saying that a comment is needed. > > Thanks for the feedback, > Jakub