On 10/18/21 8:57 AM, Dmitry Bogdanov wrote: > Currently TMF commands are removed from de_device.dev_tmf_list at > the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd > up on a command status (response) is queued in transport layer. > It means that LUN and backend device can be deleted meantime and at > the moment of repsonse completion a panic is occured: > > target_tmr_work() > cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire > transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun > - // - // - // - > <<<--- lun remove > <<<--- core backend device remove > - // - // - // - > qlt_handle_abts_completion() > tfo->free_mcmd() > transport_generic_free_cmd() > target_put_sess_cmd() > core_tmr_release_req() { > if (dev) { // backend device, can not be null > spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH > > Call Trace: > NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0 > LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod] > Call Trace: > (unreliable) > 0x0 > target_put_sess_cmd+0x2a0/0x370 [target_core_mod] > transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod] > tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx] > process_one_work+0x2c4/0x5c0 > worker_thread+0x88/0x690 > > For FC protocol it is a race condition, but for iSCSI protocol it is > easyly reproduced by manual sending iSCSI commands: > - Send some SCSI sommand > - Send Abort of that command over iSCSI > - Remove LUN on target > - Send next iSCSI command to acknowledge the Abort_Response > - target panics > > There is no sense to keep the command in tmr_list until response > completion, so move the removal from tmr_list from the response > completion to the response queueing when lun is unlinked. > Move the removal from state list too as it is a subject to the same > race condition. > > Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6") > Reviewed-by: Roman Bolshakov <r.bolshakov@xxxxxxxxx> > Signed-off-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx> > > --- > v4: > remove superfluous transport_lookup_tmr_lun that could race with lun > reset > v3: > remove iscsi fix as not related to the issue > avoid double removal from tmr_list > v2: > fix stuck in tmr list in error case > > The issue exists from the very begining. > I uploaded a scapy script that helps to reproduce the issue at > https://gist.github.com/logost/cb93df41dd2432454324449b390403c4 Reviewed-by: Mike Christie <michael.christie@xxxxxxxxxx>