On 4/27/21 9:54 PM, Martin Wilck wrote: > On Tue, 2021-04-27 at 20:05 +0200, Hannes Reinecke wrote: >> On 4/27/21 6:25 PM, Christoph Hellwig wrote: >>> On Tue, Apr 27, 2021 at 11:33:04AM +0200, Hannes Reinecke wrote: >>>> As indicated in my previous mail, please change the description. >>>> We have >>>> since established a actual reason (duplicate calls to >>>> add_timer()), so >>>> please list it here. >>> >>> So what happens if the offending add_timer is changed to mod_timer? >>> >> I guess that should be fine, as the boilerplate said it can act >> as a safe version of add_timer. >> >> But that would just solve the crash upon add_timer(). > > The code doesn't use add_timer(), only mod_timer() and > del_timer_sync(). And we didn't observe a crash upon add_timer(). What > we observed was that a timer had been enqueued multiple times, and the > kernel crashes in expire_timers()->detach_timer(), when it encounters > an already detached entry in the timer list. > nvme_mpath_init() doesn't use add_timer, but it uses timer_setup(). And calling that on an already pending timer is even worse :-) And my point is that the anatt timer is not stopped at the end of nvme_init_identify() if any of the calls to nvme_configure_apst() nvme_configure_timestamp() nvme_configure_directives() nvme_configure_acre() returns with an error. If they do the controller is reset, causing eg nvme_tcp_configure_admin_queue() to be called, which will be calling timer_setup() with the original timer still running. If the (original) timer triggers _after_ that time we have the crash. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions Germany GmbH, 90409 Nürnberg GF: F. Imendörffer, HRB 36809 (AG Nürnberg)