On Fri, Apr 16, 2021 at 12:21:46PM +0300, Dmitry Bogdanov wrote: > @@ -719,8 +726,16 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd) > if (!lun) > return; > > + target_remove_from_state_list(cmd); > + target_remove_from_tmr_list(cmd); > + > if (cmpxchg(&cmd->lun_ref_active, true, false)) > percpu_ref_put(&lun->lun_ref); > + > + /* > + * Clear struct se_cmd->se_lun before the handoff to FE. > + */ > + cmd->se_lun = NULL; > } Sadly we just found out that this code is racing with core_tmr_drain_tmr_list(). If LUN RESET comes in while there are still some outstanding ABORT TASK functions left, the following sequence is possible: 1. During LUN RESET processing core_tmr_drain_tmr_list() is called 2. During ABORT TASK processing transport_lun_remove_cmd() is called at the sime time 3. core_tmr_drain_tmr_list() acquires &dev->se_tmr_lock lock and moves TMRs to the on-stack drain_tmr_list 4. core_tmr_drain_tmr_list() releases &dev->se_tmr_lock and starts working on the drain_tmr_list 5. At the same moment target_remove_from_tmr_list() is called 6. It acquires &dev->se_tmr_lock, removes TMR from the list by list_del_init() and releases &dev->se_tmr_lock What happens next is this: [ 391.438244] LUN_RESET: releasing TMR 00000000e2ee2634 Function: 0x01, Response: 0x05, t_state: 11 [ 391.438246] LUN_RESET: releasing TMR 00000000e2ee2634 Function: 0x01, Response: 0x05, t_state: 11 The same TMR is being pulled out twice out of the drain_tmr_list. This happens because there are no locks that prevent the list traversal in core_tmr_drain_tmr_list() and the list element removal in target_remove_from_tmr_list() from being executed concurrently. So list_del_init() in target_remove_from_tmr_list() calls INIT_LIST_HEAD() and tmr_p->next now points to tmr_p. Hence the following warnings: [ 391.438300] WARNING: CPU: 12 PID: 20064 at ../drivers/target/target_core_transport.c:2785 ... [ 391.438448] WARNING: CPU: 12 PID: 20064 at ../lib/refcount.c:28 refcount_warn_saturate+0x224/0x240 This issue also prevents other TMRs from being released, resulting in a stuck session. Not always, since sometimes drain_tmr_list only contains one element, but still possible.