On 4/16/21 4:21 AM, Dmitry Bogdanov wrote: > Currently TMF commands are removed from de_device.dev_tmf_list at > the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd > up on a command status (response) is queued in transport layer. > It means that LUN and backend device can be deleted meantime and at > the moment of repsonse completion a panic is occured: > > target_tmr_work() > cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire > transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun > - // - // - // - > <<<--- lun remove > <<<--- core backend device remove > - // - // - // - > qlt_handle_abts_completion() > tfo->free_mcmd() > transport_generic_free_cmd() > target_put_sess_cmd() > core_tmr_release_req() { > if (dev) { // backend device, can not be null > spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH > > Call Trace: > NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0 > LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod] > Call Trace: > (unreliable) > 0x0 > target_put_sess_cmd+0x2a0/0x370 [target_core_mod] > transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod] > tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx] > process_one_work+0x2c4/0x5c0 > worker_thread+0x88/0x690 > > For FC protocol it is a race condition, but for iSCSI protocol it is > easyly reproduced by manual sending iSCSI commands: > - Send some SCSI sommand > - Send Abort of that command over iSCSI > - Remove LUN on target > - Send next iSCSI command to acknowledge the Abort_Response > - target panics > > There is no sense to keep the command in tmr_list until response > completion, so move the removal from tmr_list from the response > completion to the response queueing when lun is unlinked. > Move the removal from state list too as it is a subject to the same > race condition. > > Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6") > Reviewed-by: Roman Bolshakov <r.bolshakov@xxxxxxxxx> > Signed-off-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx> > > --- > v2: > fix cmd stuck in tmr list in error case in iscsi > move clearing cmd->se_lun to transport_lun_remove_cmd > > The issue exists from the very begining. > I uploaded a scapy script that helps to reproduce the issue at > https://urldefense.com/v3/__https://gist.github.com/logost/cb93df41dd2432454324449b390403c4__;!!GqivPVa7Brio!MjANCRIp5ZrtKYonxEKclROOwOe7s-adKHLiUd2Njis6m1774RMpLGNkHyapFf68BwFr$ > --- > drivers/target/iscsi/iscsi_target.c | 2 +- > drivers/target/target_core_tmr.c | 9 -------- > drivers/target/target_core_transport.c | 29 +++++++++++++++++++------- > 3 files changed, 23 insertions(+), 17 deletions(-) > > diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c > index d0e7ed8f28cc..3311b031a812 100644 > --- a/drivers/target/iscsi/iscsi_target.c > +++ b/drivers/target/iscsi/iscsi_target.c > @@ -2142,7 +2142,7 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd, > * TMR TASK_REASSIGN. > */ > iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state); > - target_put_sess_cmd(&cmd->se_cmd); > + transport_generic_free_cmd(&cmd->se_cmd, false); > return 0; > } Doh. I see how I got all confused. I guess this path leaks the lun_ref taken by transport_lookup_tmr_lun. It looks like an old issue and nothing to do with your patch. I'm not sure if we are supposed to be calling transport_generic_free_cmd twice. It looks like it works ok, because your patch added the "cmd->se_lun = NULL" in transport_lun_remove_cmd, so we won't do a double list deletion. It feels dirty though. I can feel Bart saying, "Mike did you see the comment at the top of the function". :) Maybe it's best to more cleanly unwind what was setup in the rror path. I think we can fix lun_ref leak too. So instead of doing transport_generic_free_cmd above do transport_lun_remove_cmd to match/undo the transport_lookup_tmr_lun call in iscsit_handle_task_mgt_cmd?