On Tue, 2 Mar 2021 at 02:37, Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote: ... > @@ -350,25 +351,12 @@ static inline void sk_psock_cork_free(struct sk_psock *psock) > } > } > > -static inline void sk_psock_update_proto(struct sock *sk, > - struct sk_psock *psock, > - struct proto *ops) > -{ > - /* Pairs with lockless read in sk_clone_lock() */ > - WRITE_ONCE(sk->sk_prot, ops); > -} > - > static inline void sk_psock_restore_proto(struct sock *sk, > struct sk_psock *psock) > { > sk->sk_prot->unhash = psock->saved_unhash; Not related to your patch set, but why do an extra restore of sk_prot->unhash here? At this point sk->sk_prot is one of our tcp_bpf / udp_bpf protos, so overwriting that seems wrong? > - if (inet_csk_has_ulp(sk)) { > - tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space); > - } else { > - sk->sk_write_space = psock->saved_write_space; > - /* Pairs with lockless read in sk_clone_lock() */ > - WRITE_ONCE(sk->sk_prot, psock->sk_proto); > - } > + if (psock->saved_update_proto) > + psock->saved_update_proto(sk, true); > } > > static inline void sk_psock_set_state(struct sk_psock *psock, > diff --git a/include/net/sock.h b/include/net/sock.h > index 636810ddcd9b..0e8577c917e8 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -1184,6 +1184,9 @@ struct proto { > void (*unhash)(struct sock *sk); > void (*rehash)(struct sock *sk); > int (*get_port)(struct sock *sk, unsigned short snum); > +#ifdef CONFIG_BPF_SYSCALL > + int (*update_proto)(struct sock *sk, bool restore); Kind of a nit, but this name suggests that the callback is a lot more generic than it really is. The only thing you can use it for is to prep the socket to be sockmap ready since we hardwire sockmap_unhash, etc. It's also not at all clear that this only works if sk has an sk_psock associated with it. Calling it without one would crash the kernel since the update_proto functions don't check for !sk_psock. Might as well call it install_sockmap_hooks or something and have a valid sk_psock be passed in to the callback. Additionally, I'd prefer if the function returned a struct proto * like it does at the moment. That way we keep sk->sk_prot manipulation confined to the sockmap code and don't have to copy paste it into every proto. > diff --git a/net/core/sock_map.c b/net/core/sock_map.c > index 3bddd9dd2da2..13d2af5bb81c 100644 > --- a/net/core/sock_map.c > +++ b/net/core/sock_map.c > @@ -184,26 +184,10 @@ static void sock_map_unref(struct sock *sk, void *link_raw) > > static int sock_map_init_proto(struct sock *sk, struct sk_psock *psock) > { > - struct proto *prot; > - > - switch (sk->sk_type) { > - case SOCK_STREAM: > - prot = tcp_bpf_get_proto(sk, psock); > - break; > - > - case SOCK_DGRAM: > - prot = udp_bpf_get_proto(sk, psock); > - break; > - > - default: > + if (!sk->sk_prot->update_proto) > return -EINVAL; > - } > - > - if (IS_ERR(prot)) > - return PTR_ERR(prot); > - > - sk_psock_update_proto(sk, psock, prot); > - return 0; > + psock->saved_update_proto = sk->sk_prot->update_proto; > + return sk->sk_prot->update_proto(sk, false); I think reads / writes from sk_prot need READ_ONCE / WRITE_ONCE. We've not been diligent about this so far, but I think it makes sense to be careful in new code. -- Lorenz Bauer | Systems Engineer 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK www.cloudflare.com