On Mon, Apr 15, 2024 at 4:33 PM Valentin Schneider <vschneid@xxxxxxxxxx> wrote: > > On 15/04/24 14:35, Eric Dumazet wrote: > > On Mon, Apr 15, 2024 at 1:34 PM Valentin Schneider <vschneid@xxxxxxxxxx> wrote: > >> > >> Hi, > >> > >> This is v5 of the series where the tw_timer is un-pinned to get rid of > >> interferences in isolated CPUs setups. > >> > >> The first patch is a new one stemming from Jakub's bug reported. It's there > >> mainly to make the reviewing a bit easier, but as it changes behaviour it should > >> be squashed with the second one. > >> > >> Revisions > >> ========= > >> > >> v4 -> v5 > >> ++++++++ > >> > >> o Rebased against latest Linus' tree > >> o Converted tw_timer into a delayed work following Jakub's bug report on v4 > >> http://lore.kernel.org/r/20240411100536.224fa1e7@xxxxxxxxxx > > > > What was the issue again ? > > > > Please explain precisely why it was fundamentally tied to the use of > > timers (and this was not possible to fix the issue without > > adding work queues and more dependencies to TCP stack) > > In v4 I added the use of the ehash lock to serialize arming the timewait > timer vs destroying it (inet_twsk_schedule() vs inet_twsk_deschedule_put()). > > Unfortunately, holding a lock both in a timer callback and in the context > in which it is destroyed is invalid. AIUI the issue is as follows: > > CPUx CPUy > spin_lock(foo); > <timer fires> > call_timer_fn() > spin_lock(foo) // blocks > timer_shutdown_sync() > __timer_delete_sync() > __try_to_del_timer_sync() // looped as long as timer is running > <deadlock> > > In our case, we had in v4: > > inet_twsk_deschedule_put() > spin_lock(ehash_lock); > tw_timer_handler() > inet_twsk_kill() > spin_lock(ehash_lock); > __inet_twsk_kill(); > timer_shutdown_sync(&tw->tw_timer); > > The fix here is to move the timer deletion to a non-timer > context. Workqueues fit the bill, and as the tw_timer_handler() would just queue > a work item, I converted it to a delayed_work. I do not like this delayed work approach. Adding more dependencies to the TCP stack is not very nice from a maintainer point of view. Why couldn't you call timer_shutdown_sync() before grabbing the lock ?