On 16/10/23 17:40, Eric Dumazet wrote: > On Mon, Oct 16, 2023 at 3:00 PM Valentin Schneider <vschneid@xxxxxxxxxx> wrote: >> >> The TCP timewait timer is proving to be problematic for setups where scheduler >> CPU isolation is achieved at runtime via cpusets (as opposed to statically via >> isolcpus=domains). >> >> What happens there is a CPU goes through tcp_time_wait(), arming the time_wait >> timer, then gets isolated. TCP_TIMEWAIT_LEN later, the timer fires, causing >> interference for the now-isolated CPU. This is conceptually similar to the issue >> described in >> e02b93124855 ("workqueue: Unbind kworkers before sending them to exit()") >> >> Making the timer un-pinned would resolve this, as it would be queued onto >> HK_FLAG_TIMER CPUs. It would Unfortunately go against >> ed2e92394589 ("tcp/dccp: fix timewait races in timer handling") >> as we'd need to arm the timer after the *hashdance() to not have it fire before >> we've finished setting up the timewait_socket. >> >> However, looking into this, I cannot grok what race is fixed by having the timer >> *armed* before the hashdance. > > That was because : > > 1) the timer could expire before we had a chance to set refcnt to > a non zero value. I guess this is fine if we use an extra atomic decrement. > > OR > > 2) another cpu could find the TW and delete it (trying to cancel the > tw_timer) before > we could arm the timer. ( inet_twsk_deschedule_put() is using > del_timer_sync() followed by inet_twsk_kill()) > > Thus the tw timer would be armed for 60 seconds, then we would have to > wait for the timer to really > get rid of the tw structure. > > I think you also need to change inet_twsk_deschedule_put() logic ? > Gotcha, thank you for pointing it out. >> Keep softirqs disabled, but make the timer un-pinned and arm it after the >> hashdance. Remote CPUs may start using the timewait socket before the timer is >> armed, but their execution of __inet_lookup_established() won't prevent the >> arming of the timer. > > OK, I guess we can live with the following race : > > CPU0 > > allocates a tw, insert it in hash table > > CPU1: finds the TW and removes it (timer > cancel does nothing) > > CPU0 > arms a TW timer, lasting > Looks reasonable to me, I'll go write v2. Thanks for the help!