On Tue, Mar 2, 2021 at 8:22 AM Lorenz Bauer <lmb@xxxxxxxxxxxxxx> wrote: > > On Tue, 2 Mar 2021 at 02:37, Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote: > > ... > > static inline void sk_psock_restore_proto(struct sock *sk, > > struct sk_psock *psock) > > { > > sk->sk_prot->unhash = psock->saved_unhash; > > Not related to your patch set, but why do an extra restore of > sk_prot->unhash here? At this point sk->sk_prot is one of our tcp_bpf > / udp_bpf protos, so overwriting that seems wrong? Good catch. It seems you are right, but I need a double check. And yes, it is completely unrelated to my patch, as the current code has the same problem. > > > - if (inet_csk_has_ulp(sk)) { > > - tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space); > > - } else { > > - sk->sk_write_space = psock->saved_write_space; > > - /* Pairs with lockless read in sk_clone_lock() */ > > - WRITE_ONCE(sk->sk_prot, psock->sk_proto); > > - } > > + if (psock->saved_update_proto) > > + psock->saved_update_proto(sk, true); > > } > > > > static inline void sk_psock_set_state(struct sk_psock *psock, > > diff --git a/include/net/sock.h b/include/net/sock.h > > index 636810ddcd9b..0e8577c917e8 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -1184,6 +1184,9 @@ struct proto { > > void (*unhash)(struct sock *sk); > > void (*rehash)(struct sock *sk); > > int (*get_port)(struct sock *sk, unsigned short snum); > > +#ifdef CONFIG_BPF_SYSCALL > > + int (*update_proto)(struct sock *sk, bool restore); > > Kind of a nit, but this name suggests that the callback is a lot more > generic than it really is. The only thing you can use it for is to > prep the socket to be sockmap ready since we hardwire sockmap_unhash, > etc. It's also not at all clear that this only works if sk has an > sk_psock associated with it. Calling it without one would crash the > kernel since the update_proto functions don't check for !sk_psock. > > Might as well call it install_sockmap_hooks or something and have a > valid sk_psock be passed in to the callback. Additionally, I'd prefer For the name, sure, I am always open to better names. Not sure if 'install_sockmap_hooks' is a good name, I also want to express we are overriding sk_prot. How about 'psock_update_sk_prot'? > if the function returned a struct proto * like it does at the moment. > That way we keep sk->sk_prot manipulation confined to the sockmap code > and don't have to copy paste it into every proto. Well, TCP seems too special to do this, as it could call tcp_update_ulp() to update the proto. > > > diff --git a/net/core/sock_map.c b/net/core/sock_map.c > > index 3bddd9dd2da2..13d2af5bb81c 100644 > > --- a/net/core/sock_map.c > > +++ b/net/core/sock_map.c > > @@ -184,26 +184,10 @@ static void sock_map_unref(struct sock *sk, void *link_raw) > > > > static int sock_map_init_proto(struct sock *sk, struct sk_psock *psock) > > { > > - struct proto *prot; > > - > > - switch (sk->sk_type) { > > - case SOCK_STREAM: > > - prot = tcp_bpf_get_proto(sk, psock); > > - break; > > - > > - case SOCK_DGRAM: > > - prot = udp_bpf_get_proto(sk, psock); > > - break; > > - > > - default: > > + if (!sk->sk_prot->update_proto) > > return -EINVAL; > > - } > > - > > - if (IS_ERR(prot)) > > - return PTR_ERR(prot); > > - > > - sk_psock_update_proto(sk, psock, prot); > > - return 0; > > + psock->saved_update_proto = sk->sk_prot->update_proto; > > + return sk->sk_prot->update_proto(sk, false); > > I think reads / writes from sk_prot need READ_ONCE / WRITE_ONCE. We've > not been diligent about this so far, but I think it makes sense to be > careful in new code. Hmm, there are many places not using READ_ONCE/WRITE_ONCE, for a quick example: void sock_map_unhash(struct sock *sk) { void (*saved_unhash)(struct sock *sk); struct sk_psock *psock; rcu_read_lock(); psock = sk_psock(sk); if (unlikely(!psock)) { rcu_read_unlock(); if (sk->sk_prot->unhash) sk->sk_prot->unhash(sk); return; } saved_unhash = psock->saved_unhash; sock_map_remove_links(sk, psock); rcu_read_unlock(); saved_unhash(sk); } Thanks.