On 4/7/21 9:21 AM, Dmitry Bogdanov wrote: > Currently TMF commands are removed from de_device.dev_tmf_list at > the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd > up on a command status (response) is queued in transport layer. > It means that LUN and backend device can be deleted meantime and at > the moment of repsonse completion a panic is occured: > > target_tmr_work() > cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire > transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun > — // — // — // — > <<<--- lun remove > <<<--- core backend device remove > — // — // — // — > qlt_handle_abts_completion() > tfo->free_mcmd() > transport_generic_free_cmd() > target_put_sess_cmd() > core_tmr_release_req() { > if (dev) { // backend device, can not be null > spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH > > Call Trace: > NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0 > LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod] > Call Trace: > (unreliable) > 0x0 > target_put_sess_cmd+0x2a0/0x370 [target_core_mod] > transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod] > tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx] > process_one_work+0x2c4/0x5c0 > worker_thread+0x88/0x690 > > For FC protocol it is a race condition, but for iSCSI protocol it is > easyly reproduced by manual sending iSCSI commands: > - Send some SCSI sommand > - Send Abort of that command over iSCSI > - Remove LUN on target > - Send next iSCSI command to acknowledge the Abort_Response > - target panics > > There is no sense to keep the command in tmr_list until response > completion, so move the removal from tmr_list from the response > completion to the response queueing when lun is unlinked. > Move the removal from state list too as it is a subject to the same > race condition. > > Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6") > Reviewed-by: Roman Bolshakov <r.bolshakov@xxxxxxxxx> > Signed-off-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx> > > --- > The issue exists from the very begining. > I uploaded a script that helps to reproduce the issue at > https://urldefense.com/v3/__https://gist.github.com/logost/cb93df41dd2432454324449b390403c4__;!!GqivPVa7Brio!KN-N7Ult7Itn2AzY6LdP2vm0D81UIjpbCyOAH3uf2OoLGUykpGP3dJTQlLm9m71MjsxY$ > --- > drivers/target/target_core_tmr.c | 9 --------- > drivers/target/target_core_transport.c | 19 +++++++++++++++++-- > 2 files changed, 17 insertions(+), 11 deletions(-) > > diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c > index 7347285471fa..323a173010c1 100644 > --- a/drivers/target/target_core_tmr.c > +++ b/drivers/target/target_core_tmr.c > @@ -50,15 +50,6 @@ EXPORT_SYMBOL(core_tmr_alloc_req); > > void core_tmr_release_req(struct se_tmr_req *tmr) > { > - struct se_device *dev = tmr->tmr_dev; > - unsigned long flags; > - > - if (dev) { > - spin_lock_irqsave(&dev->se_tmr_lock, flags); > - list_del_init(&tmr->tmr_list); > - spin_unlock_irqrestore(&dev->se_tmr_lock, flags); > - } > - > kfree(tmr); > } > > diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c > index 5ecb9f18a53d..4d9968f43cf1 100644 > --- a/drivers/target/target_core_transport.c > +++ b/drivers/target/target_core_transport.c > @@ -667,6 +667,20 @@ static void target_remove_from_state_list(struct se_cmd *cmd) > spin_unlock_irqrestore(&dev->queues[cmd->cpuid].lock, flags); > } > > +static void target_remove_from_tmr_list(struct se_cmd *cmd) > +{ > + struct se_device *dev = NULL; > + unsigned long flags; > + This is just a nit. Maybe just do: struct se_device *dev = NULL; unsigned long flags; if (!(cmd->se_cmd_flags & SCF_SCSI_TMR_CDB)) return; dev = cmd->se_tmr_req->tmr_dev; spin_lock_irqsave(&dev->se_tmr_lock, flags); list_del_init(&cmd->se_tmr_req->tmr_list); spin_unlock_irqrestore(&dev->se_tmr_lock, flags); It will look like target_remove_from_state_list. Below I was expecting some sort of twist ending with the extra if in there. But we just wanted to run the function for tmrs. > + if (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) > + dev = cmd->se_tmr_req->tmr_dev; > + > + if (dev) { > + spin_lock_irqsave(&dev->se_tmr_lock, flags); > + list_del_init(&cmd->se_tmr_req->tmr_list); > + spin_unlock_irqrestore(&dev->se_tmr_lock, flags); > + } > +} > /* > * This function is called by the target core after the target core has > * finished processing a SCSI command or SCSI TMF. Both the regular command > @@ -678,8 +692,6 @@ static int transport_cmd_check_stop_to_fabric(struct se_cmd *cmd) > { > unsigned long flags; > > - target_remove_from_state_list(cmd); > - > /* > * Clear struct se_cmd->se_lun before the handoff to FE. > */ Right below this line we clear se_cmd->selun. I think that should go in transport_lun_remove_cmd since it's the function that does the put and now with your changes we handle all the last refs there. > @@ -719,6 +731,9 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd) > if (!lun) > return; > > + target_remove_from_state_list(cmd); > + target_remove_from_tmr_list(cmd); > + > if (cmpxchg(&cmd->lun_ref_active, true, false)) > percpu_ref_put(&lun->lun_ref); > } >