On 5/21/21 8:21 PM, Kuniyuki Iwashima wrote: > This patch also changes the code to call reuseport_migrate_sock() and > inet_reqsk_clone(), but unlike the other cases, we do not call > inet_reqsk_clone() right after reuseport_migrate_sock(). > > Currently, in the receive path for TCP_NEW_SYN_RECV sockets, its listener > has three kinds of refcnt: > > (A) for listener itself > (B) carried by reuqest_sock > (C) sock_hold() in tcp_v[46]_rcv() > > While processing the req, (A) may disappear by close(listener). Also, (B) > can disappear by accept(listener) once we put the req into the accept > queue. So, we have to hold another refcnt (C) for the listener to prevent > use-after-free. > > For socket migration, we call reuseport_migrate_sock() to select a listener > with (A) and to increment the new listener's refcnt in tcp_v[46]_rcv(). > This refcnt corresponds to (C) and is cleaned up later in tcp_v[46]_rcv(). > Thus we have to take another refcnt (B) for the newly cloned request_sock. > > In inet_csk_complete_hashdance(), we hold the count (B), clone the req, and > try to put the new req into the accept queue. By migrating req after > winning the "own_req" race, we can avoid such a worst situation: > > CPU 1 looks up req1 > CPU 2 looks up req1, unhashes it, then CPU 1 loses the race > CPU 3 looks up req2, unhashes it, then CPU 2 loses the race > ... > > Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxxxx> > Acked-by: Martin KaFai Lau <kafai@xxxxxx> > --- > net/ipv4/inet_connection_sock.c | 34 ++++++++++++++++++++++++++++++--- > net/ipv4/tcp_ipv4.c | 20 +++++++++++++------ > net/ipv4/tcp_minisocks.c | 4 ++-- > net/ipv6/tcp_ipv6.c | 14 +++++++++++--- > 4 files changed, 58 insertions(+), 14 deletions(-) > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > index c1f068464363..b795198f919a 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -1113,12 +1113,40 @@ struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child, > struct request_sock *req, bool own_req) > { > if (own_req) { > - inet_csk_reqsk_queue_drop(sk, req); > - reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req); > - if (inet_csk_reqsk_queue_add(sk, req, child)) > + inet_csk_reqsk_queue_drop(req->rsk_listener, req); > + reqsk_queue_removed(&inet_csk(req->rsk_listener)->icsk_accept_queue, req); > + > + if (sk != req->rsk_listener) { > + /* another listening sk has been selected, > + * migrate the req to it. > + */ > + struct request_sock *nreq; > + > + /* hold a refcnt for the nreq->rsk_listener > + * which is assigned in inet_reqsk_clone() > + */ > + sock_hold(sk); > + nreq = inet_reqsk_clone(req, sk); > + if (!nreq) { > + inet_child_forget(sk, req, child); Don't you need a sock_put(sk) here ? \ > + goto child_put; > + } > + > + refcount_set(&nreq->rsk_refcnt, 1); > + if (inet_csk_reqsk_queue_add(sk, nreq, child)) { > + reqsk_migrate_reset(req); > + reqsk_put(req); > + return child; > + } > + > + reqsk_migrate_reset(nreq); > + __reqsk_free(nreq); > + } else if (inet_csk_reqsk_queue_add(sk, req, child)) { > return child; > + } >