Re: [PATCH] target: core: remove from tmr_list at lun unlink

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/9/21 12:18 PM, Mike Christie wrote:
> On 4/7/21 9:21 AM, Dmitry Bogdanov wrote:
>> Currently TMF commands are removed from de_device.dev_tmf_list at
>> the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd
>> up on a command status (response) is queued in transport layer.
>> It means that LUN and backend device can be deleted meantime and at
>> the moment of repsonse completion a panic is occured:
>>
>> target_tmr_work()
>> 	cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire
>> 	transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun
>> — // — // — // —
>> <<<--- lun remove
>> <<<--- core backend device remove
>> — // — // — // —
>> qlt_handle_abts_completion()
>>   tfo->free_mcmd()
>>     transport_generic_free_cmd()
>>       target_put_sess_cmd()
>>         core_tmr_release_req() {
>>           if (dev) { // backend device, can not be null
>>             spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH
>>
>> Call Trace:
>> NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0
>> LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod]
>> Call Trace:
>> (unreliable)
>> 0x0
>> target_put_sess_cmd+0x2a0/0x370 [target_core_mod]
>> transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod]
>> tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx]
>> process_one_work+0x2c4/0x5c0
>> worker_thread+0x88/0x690
>>
>> For FC protocol it is a race condition, but for iSCSI protocol it is
>> easyly reproduced by manual sending iSCSI commands:
>> - Send some SCSI sommand
>> - Send Abort of that command over iSCSI
>> - Remove LUN on target
>> - Send next iSCSI command to acknowledge the Abort_Response
>> - target panics
>>
>> There is no sense to keep the command in tmr_list until response
>> completion, so move the removal from tmr_list from the response
>> completion to the response queueing when lun is unlinked.
>> Move the removal from state list too as it is a subject to the same
>> race condition.
>>
>> Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6")
>> Reviewed-by: Roman Bolshakov <r.bolshakov@xxxxxxxxx>
>> Signed-off-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx>
>>
>> ---
>> The issue exists from the very begining.
>> I uploaded a script that helps to reproduce the issue at
>> https://urldefense.com/v3/__https://gist.github.com/logost/cb93df41dd2432454324449b390403c4__;!!GqivPVa7Brio!KN-N7Ult7Itn2AzY6LdP2vm0D81UIjpbCyOAH3uf2OoLGUykpGP3dJTQlLm9m71MjsxY$ 
>> ---
>>  drivers/target/target_core_tmr.c       |  9 ---------
>>  drivers/target/target_core_transport.c | 19 +++++++++++++++++--
>>  2 files changed, 17 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
>> index 7347285471fa..323a173010c1 100644
>> --- a/drivers/target/target_core_tmr.c
>> +++ b/drivers/target/target_core_tmr.c
>> @@ -50,15 +50,6 @@ EXPORT_SYMBOL(core_tmr_alloc_req);
>>  
>>  void core_tmr_release_req(struct se_tmr_req *tmr)
>>  {
>> -	struct se_device *dev = tmr->tmr_dev;
>> -	unsigned long flags;
>> -
>> -	if (dev) {
>> -		spin_lock_irqsave(&dev->se_tmr_lock, flags);
>> -		list_del_init(&tmr->tmr_list);
>> -		spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
>> -	}
>> -
>>  	kfree(tmr);
>>  }
>>  
>> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
>> index 5ecb9f18a53d..4d9968f43cf1 100644
>> --- a/drivers/target/target_core_transport.c
>> +++ b/drivers/target/target_core_transport.c
>> @@ -667,6 +667,20 @@ static void target_remove_from_state_list(struct se_cmd *cmd)
>>  	spin_unlock_irqrestore(&dev->queues[cmd->cpuid].lock, flags);
>>  }
>>  
>> +static void target_remove_from_tmr_list(struct se_cmd *cmd)
>> +{
>> +	struct se_device *dev = NULL;
>> +	unsigned long flags;
>> +
> 
> This is just a nit. Maybe just do:
> 
> struct se_device *dev = NULL;
> unsigned long flags;
> 
> if (!(cmd->se_cmd_flags & SCF_SCSI_TMR_CDB))
> 	return;
> 
> dev = cmd->se_tmr_req->tmr_dev;
> spin_lock_irqsave(&dev->se_tmr_lock, flags);
> list_del_init(&cmd->se_tmr_req->tmr_list);
> spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
> 

This might be wrong. I thought when you moved the deletion to
transport_cmd_check_stop_to_fabric, we would always have a dev set.
But in core_tmr_abort_task there is that transport_lookup_tmr_lun
check. If that is a valid check in there and it could be NULL in
the path:

transport_generic_handle_tmr -> target_handle_abort -> 
ransport_cmd_check_stop_to_fabric

then we would hit a NULL pointer. I'm not sure how we would get a NULL
dev there though. The driver would have to not use the standard
target_submit_tmr or be iscsi and not call transport_lookup_tmr_lun.


One issue with the patch though is if iscsit_tmr_abort_task fails then we don't
call transport_cmd_check_stop_to_fabric, so the tmr will be stuck on the list.




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux