On Wed, Nov 15, 2023 at 10:05 PM Valentin Schneider <vschneid@xxxxxxxxxx> wrote: > > The TCP timewait timer is proving to be problematic for setups where scheduler > CPU isolation is achieved at runtime via cpusets (as opposed to statically via > isolcpus=domains). > > What happens there is a CPU goes through tcp_time_wait(), arming the time_wait > timer, then gets isolated. TCP_TIMEWAIT_LEN later, the timer fires, causing > interference for the now-isolated CPU. This is conceptually similar to the issue > described in > e02b93124855 ("workqueue: Unbind kworkers before sending them to exit()") > > Keep softirqs disabled, but make the timer un-pinned and arm it *after* the > hashdance. > > This introduces the following (non-fatal) race: > > CPU0 CPU1 > allocates a tw > insert it in hash table > finds the TW and removes it > (timer cancel does nothing) > arms a TW timer, lasting > > This partially reverts > ed2e92394589 ("tcp/dccp: fix timewait races in timer handling") > and > ec94c2696f0b ("tcp/dccp: avoid one atomic operation for timewait hashdance") > > This also reinstores a comment from > ec94c2696f0b ("tcp/dccp: avoid one atomic operation for timewait hashdance") > as inet_twsk_hashdance() had a "Step 1" and "Step 3" comment, but the "Step > 2" had gone missing. > > Link: https://lore.kernel.org/all/ZPhpfMjSiHVjQkTk@localhost.localdomain/ > Signed-off-by: Valentin Schneider <vschneid@xxxxxxxxxx> > --- > net/dccp/minisocks.c | 16 +++++++--------- > net/ipv4/inet_timewait_sock.c | 20 +++++++++++++++----- > net/ipv4/tcp_minisocks.c | 16 +++++++--------- > 3 files changed, 29 insertions(+), 23 deletions(-) > > diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c > index 64d805b27adde..2f0fad4255e36 100644 > --- a/net/dccp/minisocks.c > +++ b/net/dccp/minisocks.c > @@ -53,16 +53,14 @@ void dccp_time_wait(struct sock *sk, int state, int timeo) > if (state == DCCP_TIME_WAIT) > timeo = DCCP_TIMEWAIT_LEN; > > - /* tw_timer is pinned, so we need to make sure BH are disabled > - * in following section, otherwise timer handler could run before > - * we complete the initialization. > - */ > - local_bh_disable(); > - inet_twsk_schedule(tw, timeo); > - /* Linkage updates. > - * Note that access to tw after this point is illegal. > - */ > + local_bh_disable(); > + > + // Linkage updates > inet_twsk_hashdance(tw, sk, &dccp_hashinfo); > + inet_twsk_schedule(tw, timeo); We could arm a timer there, while another thread/cpu found the TW in the ehash table. > + // Access to tw after this point is illegal. > + inet_twsk_put(tw); This would eventually call inet_twsk_free() while the timer is armed. I think more work is needed. Perhaps make sure that a live timer owns a reference on tw->tw_refcnt (This is not the case atm)