On 12/1/20 3:44 PM, Kuniyuki Iwashima wrote: > This patch renames reuseport_select_sock() to __reuseport_select_sock() and > adds two wrapper function of it to pass the migration type defined in the > previous commit. > > reuseport_select_sock : BPF_SK_REUSEPORT_MIGRATE_NO > reuseport_select_migrated_sock : BPF_SK_REUSEPORT_MIGRATE_REQUEST > > As mentioned before, we have to select a new listener for TCP_NEW_SYN_RECV > requests at receiving the final ACK or sending a SYN+ACK. Therefore, this > patch also changes the code to call reuseport_select_migrated_sock() even > if the listening socket is TCP_CLOSE. If we can pick out a listening socket > from the reuseport group, we rewrite request_sock.rsk_listener and resume > processing the request. > > Reviewed-by: Benjamin Herrenschmidt <benh@xxxxxxxxxx> > Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxxxx> > --- > include/net/inet_connection_sock.h | 12 +++++++++++ > include/net/request_sock.h | 13 ++++++++++++ > include/net/sock_reuseport.h | 8 +++---- > net/core/sock_reuseport.c | 34 ++++++++++++++++++++++++------ > net/ipv4/inet_connection_sock.c | 13 ++++++++++-- > net/ipv4/tcp_ipv4.c | 9 ++++++-- > net/ipv6/tcp_ipv6.c | 9 ++++++-- > 7 files changed, 81 insertions(+), 17 deletions(-) > > diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h > index 2ea2d743f8fc..1e0958f5eb21 100644 > --- a/include/net/inet_connection_sock.h > +++ b/include/net/inet_connection_sock.h > @@ -272,6 +272,18 @@ static inline void inet_csk_reqsk_queue_added(struct sock *sk) > reqsk_queue_added(&inet_csk(sk)->icsk_accept_queue); > } > > +static inline void inet_csk_reqsk_queue_migrated(struct sock *sk, > + struct sock *nsk, > + struct request_sock *req) > +{ > + reqsk_queue_migrated(&inet_csk(sk)->icsk_accept_queue, > + &inet_csk(nsk)->icsk_accept_queue, > + req); > + sock_put(sk); > + sock_hold(nsk); This looks racy to me. nsk refcount might be zero at this point. If you think it can _not_ be zero, please add a big comment here, because this would mean something has been done before reaching this function, and this sock_hold() would be not needed in the first place. There is a good reason reqsk_alloc() is using refcount_inc_not_zero(). > + req->rsk_listener = nsk; > +} > + Honestly, this patch series looks quite complex, and finding a bug in the very first function I am looking at is not really a good sign...