On Thu, Nov 23, 2023 at 3:34 PM Valentin Schneider <vschneid@xxxxxxxxxx> wrote: > > Hi Eric, > > On Mon, 20 Nov 2023 at 18:56, Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > > > On Wed, Nov 15, 2023 at 10:05 PM Valentin Schneider <vschneid@xxxxxxxxxx> wrote: > > > > > > @@ -53,16 +53,14 @@ void dccp_time_wait(struct sock *sk, int state, int timeo) > > > if (state == DCCP_TIME_WAIT) > > > timeo = DCCP_TIMEWAIT_LEN; > > > > > > - /* tw_timer is pinned, so we need to make sure BH are disabled > > > - * in following section, otherwise timer handler could run before > > > - * we complete the initialization. > > > - */ > > > - local_bh_disable(); > > > - inet_twsk_schedule(tw, timeo); > > > - /* Linkage updates. > > > - * Note that access to tw after this point is illegal. > > > - */ > > > + local_bh_disable(); > > > + > > > + // Linkage updates > > > inet_twsk_hashdance(tw, sk, &dccp_hashinfo); > > > + inet_twsk_schedule(tw, timeo); > > > > We could arm a timer there, while another thread/cpu found the TW in > > the ehash table. > > > > > > > > > + // Access to tw after this point is illegal. > > > + inet_twsk_put(tw); > > > > This would eventually call inet_twsk_free() while the timer is armed. > > > > I think more work is needed. > > > > Perhaps make sure that a live timer owns a reference on tw->tw_refcnt > > (This is not the case atm) > > > > I thought that was already the case, per inet_twsk_hashdance(): > > /* tw_refcnt is set to 3 because we have : > * - one reference for bhash chain. > * - one reference for ehash chain. > * - one reference for timer. > > and > > tw_timer_handler() > `\ > inet_twsk_kill() > `\ > inet_twsk_put() > > So AFAICT, after we go through the hashdance, there's a reference on > tw_refcnt held by the tw_timer. > inet_twsk_deschedule_put() can race with arming the timer, but it only > calls inet_twsk_kill() if the timer > was already armed & has been deleted, so there's no risk of calling it > twice... If I got it right :-) > Again, I think you missed some details. I am OOO for a few days, I do not have time to elaborate. You will need to properly track active timer by elevating tw->tw_refcnt, or I guarantee something wrong will happen.