On 4/16/21 5:39 PM, Mike Christie wrote: > On 4/16/21 4:21 AM, Dmitry Bogdanov wrote: >> Currently TMF commands are removed from de_device.dev_tmf_list at >> the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd >> up on a command status (response) is queued in transport layer. >> It means that LUN and backend device can be deleted meantime and at >> the moment of repsonse completion a panic is occured: >> >> target_tmr_work() >> cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire >> transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun >> - // - // - // - >> <<<--- lun remove >> <<<--- core backend device remove >> - // - // - // - >> qlt_handle_abts_completion() >> tfo->free_mcmd() >> transport_generic_free_cmd() >> target_put_sess_cmd() >> core_tmr_release_req() { >> if (dev) { // backend device, can not be null >> spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH >> >> Call Trace: >> NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0 >> LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod] >> Call Trace: >> (unreliable) >> 0x0 >> target_put_sess_cmd+0x2a0/0x370 [target_core_mod] >> transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod] >> tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx] >> process_one_work+0x2c4/0x5c0 >> worker_thread+0x88/0x690 >> >> For FC protocol it is a race condition, but for iSCSI protocol it is >> easyly reproduced by manual sending iSCSI commands: >> - Send some SCSI sommand >> - Send Abort of that command over iSCSI >> - Remove LUN on target >> - Send next iSCSI command to acknowledge the Abort_Response >> - target panics >> >> There is no sense to keep the command in tmr_list until response >> completion, so move the removal from tmr_list from the response >> completion to the response queueing when lun is unlinked. >> Move the removal from state list too as it is a subject to the same >> race condition. >> >> Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6") >> Reviewed-by: Roman Bolshakov <r.bolshakov@xxxxxxxxx> >> Signed-off-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx> >> >> --- >> v2: >> fix cmd stuck in tmr list in error case in iscsi >> move clearing cmd->se_lun to transport_lun_remove_cmd >> >> The issue exists from the very begining. >> I uploaded a scapy script that helps to reproduce the issue at >> https://urldefense.com/v3/__https://gist.github.com/logost/cb93df41dd2432454324449b390403c4__;!!GqivPVa7Brio!MjANCRIp5ZrtKYonxEKclROOwOe7s-adKHLiUd2Njis6m1774RMpLGNkHyapFf68BwFr$ >> --- >> drivers/target/iscsi/iscsi_target.c | 2 +- >> drivers/target/target_core_tmr.c | 9 -------- >> drivers/target/target_core_transport.c | 29 +++++++++++++++++++------- >> 3 files changed, 23 insertions(+), 17 deletions(-) >> >> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c >> index d0e7ed8f28cc..3311b031a812 100644 >> --- a/drivers/target/iscsi/iscsi_target.c >> +++ b/drivers/target/iscsi/iscsi_target.c >> @@ -2142,7 +2142,7 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd, >> * TMR TASK_REASSIGN. >> */ >> iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state); >> - target_put_sess_cmd(&cmd->se_cmd); >> + transport_generic_free_cmd(&cmd->se_cmd, false); >> return 0; >> } > > Doh. I see how I got all confused. I guess this path leaks the lun_ref > taken by transport_lookup_tmr_lun. It looks like an old issue and nothing > to do with your patch. > > I'm not sure if we are supposed to be calling transport_generic_free_cmd > twice. It looks like it works ok, because your patch added the "cmd->se_lun = NULL" > in transport_lun_remove_cmd, so we won't do a double list deletion. > It feels dirty though. I can feel Bart saying, "Mike did you see the comment > at the top of the function". :) > > Maybe it's best to more cleanly unwind what was setup in the rror path. I think we > can fix lun_ref leak too. > > So instead of doing transport_generic_free_cmd above do transport_lun_remove_cmd > to match/undo the transport_lookup_tmr_lun call in iscsit_handle_task_mgt_cmd? Shoot. I'm all over the place. I think the root issue is my original comment on the v1 patch was wrong. On a failure we would still do: iscsit_free_cmd -> transport_generic_free_cmd -> transport_lun_remove_cmd right? So we don't need any change in the iscsi target. It should all just work.