From: Eric Dumazet <eric.dumazet@xxxxxxxxx> Date: Tue, 1 Dec 2020 16:13:39 +0100 > On 12/1/20 3:44 PM, Kuniyuki Iwashima wrote: > > This patch renames reuseport_select_sock() to __reuseport_select_sock() and > > adds two wrapper function of it to pass the migration type defined in the > > previous commit. > > > > reuseport_select_sock : BPF_SK_REUSEPORT_MIGRATE_NO > > reuseport_select_migrated_sock : BPF_SK_REUSEPORT_MIGRATE_REQUEST > > > > As mentioned before, we have to select a new listener for TCP_NEW_SYN_RECV > > requests at receiving the final ACK or sending a SYN+ACK. Therefore, this > > patch also changes the code to call reuseport_select_migrated_sock() even > > if the listening socket is TCP_CLOSE. If we can pick out a listening socket > > from the reuseport group, we rewrite request_sock.rsk_listener and resume > > processing the request. > > > > Reviewed-by: Benjamin Herrenschmidt <benh@xxxxxxxxxx> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxxxx> > > --- > > include/net/inet_connection_sock.h | 12 +++++++++++ > > include/net/request_sock.h | 13 ++++++++++++ > > include/net/sock_reuseport.h | 8 +++---- > > net/core/sock_reuseport.c | 34 ++++++++++++++++++++++++------ > > net/ipv4/inet_connection_sock.c | 13 ++++++++++-- > > net/ipv4/tcp_ipv4.c | 9 ++++++-- > > net/ipv6/tcp_ipv6.c | 9 ++++++-- > > 7 files changed, 81 insertions(+), 17 deletions(-) > > > > diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h > > index 2ea2d743f8fc..1e0958f5eb21 100644 > > --- a/include/net/inet_connection_sock.h > > +++ b/include/net/inet_connection_sock.h > > @@ -272,6 +272,18 @@ static inline void inet_csk_reqsk_queue_added(struct sock *sk) > > reqsk_queue_added(&inet_csk(sk)->icsk_accept_queue); > > } > > > > +static inline void inet_csk_reqsk_queue_migrated(struct sock *sk, > > + struct sock *nsk, > > + struct request_sock *req) > > +{ > > + reqsk_queue_migrated(&inet_csk(sk)->icsk_accept_queue, > > + &inet_csk(nsk)->icsk_accept_queue, > > + req); > > + sock_put(sk); > > + sock_hold(nsk); > > This looks racy to me. nsk refcount might be zero at this point. > > If you think it can _not_ be zero, please add a big comment here, > because this would mean something has been done before reaching this function, > and this sock_hold() would be not needed in the first place. > > There is a good reason reqsk_alloc() is using refcount_inc_not_zero(). Exactly, I will fix this in the next spin like below. Thank you. ---8<--- diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 1e0958f5eb21..d8c3be31e987 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -280,7 +280,6 @@ static inline void inet_csk_reqsk_queue_migrated(struct sock *sk, &inet_csk(nsk)->icsk_accept_queue, req); sock_put(sk); - sock_hold(nsk); req->rsk_listener = nsk; } diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index 6b475897b496..4d07bddcf678 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -386,7 +386,14 @@ EXPORT_SYMBOL(reuseport_select_sock); struct sock *reuseport_select_migrated_sock(struct sock *sk, u32 hash, struct sk_buff *skb) { - return __reuseport_select_sock(sk, hash, skb, 0, BPF_SK_REUSEPORT_MIGRATE_REQUEST); + struct sock *nsk; + + nsk = __reuseport_select_sock(sk, hash, skb, 0, BPF_SK_REUSEPORT_MIGRATE_REQUEST); + if (IS_ERR_OR_NULL(nsk) || + unlikely(!refcount_inc_not_zero(&nsk->sk_refcnt))) + return NULL; + + return nsk; } EXPORT_SYMBOL(reuseport_select_migrated_sock); ---8<--- > > + req->rsk_listener = nsk; > > +} > > + > > Honestly, this patch series looks quite complex, and finding a bug in the > very first function I am looking at is not really a good sign... I also think this issue is quite complex, but it might be easier to fix than it was disscussed in 2015 [1] thanks to your some refactoring. https://lore.kernel.org/netdev/1443313848-751-1-git-send-email-tolga.ceylan@xxxxxxxxx/