Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command

Hannes Reinecke <hare@xxxxxxx> · Wed, 15 Mar 2017 14:54:16 +0100

On 03/14/2017 06:56 PM, Benjamin Block wrote:
> Hello Hannes,
> 
> On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote:
>> When a command is sent as part of the error handling there
>> is not point whatsoever to start EH escalation when that
>> command fails; we are _already_ in the error handler,
>> and the escalation is about to commence anyway.
>> So just call 'scsi_try_to_abort_cmd()' to abort outstanding
>> commands and let the main EH routine handle the rest.
>>
>> Signed-off-by: Hannes Reinecke <hare@xxxxxxx>
>> Reviewed-by: Johannes Thumshirn <jthumshirn@xxxxxxx>
>> Reviewed-by: Bart Van Assche <bart.vanassche@xxxxxxxxxxx>
>> ---
>>  drivers/scsi/scsi_error.c | 11 +----------
>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index e1ca3b8..4613aa1 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt,
>>  	return hostt->eh_abort_handler(scmd);
>>  }
>>
>> -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
>> -{
>> -	if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
>> -		if (scsi_try_bus_device_reset(scmd) != SUCCESS)
>> -			if (scsi_try_target_reset(scmd) != SUCCESS)
>> -				if (scsi_try_bus_reset(scmd) != SUCCESS)
>> -					scsi_try_host_reset(scmd);
>> -}
>> -
>>  /**
>>   * scsi_eh_prep_cmnd  - Save a scsi command info as part of error recovery
>>   * @scmd:       SCSI command structure to hijack
>> @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
>>  			break;
>>  		}
>>  	} else if (rtn != FAILED) {
>> -		scsi_abort_eh_cmnd(scmd);
>> +		scsi_try_to_abort_cmd(shost->hostt, scmd);
>>  		rtn = FAILED;
>>  	}
> 
> The idea is sound, but this implementation would cause "use-after-free"s.
> 
> I only know our own LLD well enough to judge, but with zFCP there will
> always be a chance that an abort fails - be it memory pressure,
> hardware/firmware behavior or internal EH in zFCP.
> 
> Calling queuecommand() will mean for us in the LLD, that we allocate a
> unique internal request struct for the scsi_cmnd (struct
> zfcp_fsf_request) and add that to our internal hash-table with
> outstanding commands. We assume this scsi_cmnd-pointer is ours till we
> complete it via scsi_done are yield it via successful EH-actions.
> 
> In case the abort fails, you fail to take back the ownership over the
> scsi command. Which in turn means possible "use-after-free"s when we
> still thinks the scsi command is ours, but EH has already overwritten
> the scsi-command with the original one. When we still get an answer or
> otherwise use the scsi_cmnd-pointer we would access an invalid one.
> 
That is actually not try.
As soon as we're calling 'scsi_try_to_abort_command()' ownership is
assumed to reside in the SCSI midlayer; also, the command used for
recovery here is actually using the same structure than the failed
command, so if the command abort failed the command is already in the
list of failed commands, and will be recovered after SCSI EH returned.

So no use-after-free here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)