On Fri, 2024-03-22 at 21:58 +0100, Valentin Schneider wrote: > On 21/03/24 20:03, Paolo Abeni wrote: > > Something alike the following (completely untested!!!): > > > > WDYT? > > Thanks for the suggestion! I've been preempted by other things and haven't > had time to think more about this, so I really appreciate it :) > > > --- > > diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h > > index f28da08a37b4..d696d10dc8ae 100644 > > --- a/include/net/inet_timewait_sock.h > > +++ b/include/net/inet_timewait_sock.h > > @@ -93,8 +93,10 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, > > struct inet_timewait_death_row *dr, > > const int state); > > > > -void inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, > > - struct inet_hashinfo *hashinfo); > > +void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw, > > + struct sock *sk, > > + struct inet_hashinfo *hashinfo, > > + int timeo); > > > > void __inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo, > > bool rearm); > > diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c > > index 64d805b27add..8e108a89d8e4 100644 > > --- a/net/dccp/minisocks.c > > +++ b/net/dccp/minisocks.c > > @@ -58,11 +58,10 @@ void dccp_time_wait(struct sock *sk, int state, int timeo) > > * we complete the initialization. > > */ > > local_bh_disable(); > > - inet_twsk_schedule(tw, timeo); > > /* Linkage updates. > > * Note that access to tw after this point is illegal. > > */ > > - inet_twsk_hashdance(tw, sk, &dccp_hashinfo); > > + inet_twsk_hashdance_schedule(tw, sk, &dccp_hashinfo, timeo); > > local_bh_enable(); > > } else { > > /* Sorry, if we're out of memory, just CLOSE this > > diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c > > index e8de45d34d56..dd314b06c0cd 100644 > > --- a/net/ipv4/inet_timewait_sock.c > > +++ b/net/ipv4/inet_timewait_sock.c > > @@ -97,8 +97,10 @@ static void inet_twsk_add_node_rcu(struct inet_timewait_sock *tw, > > * Essentially we whip up a timewait bucket, copy the relevant info into it > > * from the SK, and mess with hash chains and list linkage. > > */ > > -void inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, > > - struct inet_hashinfo *hashinfo) > > +void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw, > > + struct sock *sk, > > + struct inet_hashinfo *hashinfo, > > + int timeo) > > { > > const struct inet_sock *inet = inet_sk(sk); > > const struct inet_connection_sock *icsk = inet_csk(sk); > > @@ -135,6 +137,8 @@ void inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, > > if (__sk_nulls_del_node_init_rcu(sk)) > > sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); > > > > + inet_twsk_schedule(tw, timeo); > > + > > spin_unlock(lock); > > > > That arms the timer before the refcounts are set up in the tail end of > the hashdance, which is what we have upstream ATM. > > Unfortunately this relies on the timer being TIMER_PINNED and having > softirqs disabled: the timer can only be enqueued on the local CPU, and it > can't fire until softirqs are enabled, so refcounts can safely be updated > after it is armed because it can't fire. > > For dynamic CPU isolation we want to make this timer not-pinned, so that it > can be scheduled on housekeeping CPUs. However that means the > local_bh_disable() won't prevent the timer from firing, and that means the > refcounts need to be written to before it is armed. Ouch, right you are, I underlooked that. > Using the ehash lock is clever though, and the first thing inet_twsk_kill() > does is grabbing it... Maybe something like the below? It (should) prevent > this interleaving race: > > tcp_time_wait() > inet_twsk_hashdance() > inet_twsk_deschedule_put() > del_timer_sync() > inet_twsk_schedule() > > whether it is sane is another thing :-) [...] That looks safe to me but, compared to the current code, will need an additional WMB in tcp_time_wait() and will take the hash lock unconditionally in inet_twsk_deschedule_put(). The latter should not be fast-path, I'm unsure if the whole thing be acceptable from performance perspective??? Eric WDYT? Thanks, Paolo