On 9/20/21 11:40 AM, Dmitriy Bogdanov wrote: > Hi Mike, > >>> @@ -234,6 +225,7 @@ static void core_tmr_drain_tmr_list( >>> } >>> >>> list_move_tail(&tmr_p->tmr_list, &drain_tmr_list); >>> + tmr_p->tmr_dev = NULL; >> >> Is this patch now adding a way to hit: >> >> if (!tmr->tmr_dev) >> WARN_ON_ONCE(transport_lookup_tmr_lun(tmr->task_cmd) < 0); >> >> in core_tmr_abort_task? >> >> You have the abort and lun reset works running on different CPUs. >> The lun reset hits the above code first and clears tmr_dev. >> The abort then hits the tmr->tmr_dev check and tries to do >> transport_lookup_tmr_lun. >> >> For the case where the lun is not removed, it looks like >> transport_lookup_tmr_lun will add the tmr to the dev_tmr_list >> but it would also be on the drain_tmr_list above so we would >> hit list corruption. > > Yes, there is a such race. I think, I can solve it by changing the order of > draining the tmr_list and state_list at LUN Reset to make the raced lines > be under the same lock. > > Especially SAM-5 describes(but does not require) aborting commands > before tmfs: > | When responding to a logical unit reset condition, the logical unit shall: > | a) abort all commands as described in 5.6; > | b) abort all copy operations (see SPC-4); > | c) terminate all task management functions; > > >> For the case where the lun is getting removed, percpu_ref_tryget_live >> would fail in transport_lookup_tmr_lun and we hit the WARN_ON_ONCE. >> I think though with your patch, we would be ok and don't want >> the WARN_ON_ONCE, right? The lun reset would just wait for the >> abort. When it completes the abort and reset complete as expected. > > I don’t understand the meaning of that transport_lookup_tmr_lun there. > Every TMF Abort has already executed transport_lookup_tmr_lun at the very > beginning of its handling. Yeah, I think it's not needed. It came in with: commit 2c9fa49e100f962af988f1c0529231bf14905cda Author: Bart Van Assche <bvanassche@xxxxxxx> Date: Tue Nov 27 15:52:03 2018 -0800 scsi: target/core: Make ABORT and LUN RESET handling synchronous gree. It looks like it was added in: and in that patch I can't see it ever happening. We have 2 ways to submit an abort tmr: 1. target_submit_tmr - Calls transport_lookup_tmr_lun then transport_generic_handle_tmr. 2. iscsit_handle_task_mgt_cmd - Will call transport_lookup_tmr_lun for every tmr except the iscsi specific TASK REASSIGN. TASK REASSING is not passed to transport_generic_handle_tmr. I don't see any places where tmr_dev is NULL after transport_lookup_tmr_lun has set it and added it to the list. So I think you can just kill it.