On Wed, 2021-04-28 at 08:35 +0200, Hannes Reinecke wrote: > On 4/27/21 9:54 PM, Martin Wilck wrote: > > On Tue, 2021-04-27 at 20:05 +0200, Hannes Reinecke wrote: > > > On 4/27/21 6:25 PM, Christoph Hellwig wrote: > > > > On Tue, Apr 27, 2021 at 11:33:04AM +0200, Hannes Reinecke > > > > wrote: > > > > > As indicated in my previous mail, please change the > > > > > description. > > > > > We have > > > > > since established a actual reason (duplicate calls to > > > > > add_timer()), so > > > > > please list it here. > > > > > > > > So what happens if the offending add_timer is changed to > > > > mod_timer? > > > > > > > I guess that should be fine, as the boilerplate said it can act > > > as a safe version of add_timer. > > > > > > But that would just solve the crash upon add_timer(). > > > > The code doesn't use add_timer(), only mod_timer() and > > del_timer_sync(). And we didn't observe a crash upon add_timer(). > > What > > we observed was that a timer had been enqueued multiple times, and > > the > > kernel crashes in expire_timers()->detach_timer(), when it > > encounters > > an already detached entry in the timer list. > > > nvme_mpath_init() doesn't use add_timer, but it uses timer_setup(). Yes. The notion that this is wrong was the idea behind my patch. > And > calling that on an already pending timer is even worse :-) > > And my point is that the anatt timer is not stopped at the end of > nvme_init_identify() if any of the calls to > > nvme_configure_apst() > nvme_configure_timestamp() > nvme_configure_directives() > nvme_configure_acre() > > returns with an error. If they do the controller is reset, causing > eg nvme_tcp_configure_admin_queue() to be called, which will be > calling timer_setup() with the original timer still running. > If the (original) timer triggers _after_ that time we have the crash. You are right. But afaics the problem doesn't have to originate in these 4 function calls. The same applies even after nvme_init_identify() has returned. Any error that would trigger the error recovery work after the anatt timer has started would have this effect. Regards, Martin