Re: [PATCH v3 09/18] scsi: sd: handle read/write CDL timeout failures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/28/23 00:34, Hannes Reinecke wrote:
> On 1/24/23 20:02, Niklas Cassel wrote:
>> Commands using a duration limit descriptor that has limit policies set
>> to a value other than 0x0 may be failed by the device if one of the
>> limits are exceeded. For such commands, since the failure is the result
>> of the user duration limit configuration and workload, the commands
>> should not be retried and terminated immediately. Furthermore, to allow
>> the user to differentiate these "soft" failures from hard errors due to
>> hardware problem, a different error code than EIO should be returned.
>>
>> There are 2 cases to consider:
>> (1) The failure is due to a limit policy failing the command with a
>> check condition sense key, that is, any limit policy other than 0xD.
>> For this case, scsi_check_sense() is modified to detect failures with
>> the ABORTED COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING
>> or COMMAND TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING
>> PROCESSING DUE TO ERROR RECOVERY additional sense code. For these
>> failures, a SUCCESS disposition is returned so that
>> scsi_finish_command() is called to terminate the command.
>>
>> (2) The failure is due to a limit policy set to 0xD, which result in the
>> command being terminated with a GOOD status, COMPLETED sense key, and
>> DATA CURRENTLY UNAVAILABLE additional sense code. To handle this case,
>> the scsi_check_sense() is modified to return a SUCCESS disposition so
>> that scsi_finish_command() is called to terminate the command.
>> In addition, scsi_decide_disposition() has to be modified to see if a
>> command being terminated with GOOD status has sense data.
>> This is as defined in SCSI Primary Commands - 6 (SPC-6), so all
>> according to spec, even if GOOD status commands were not checked before.
>>
>> If scsi_check_sense() detects sense data representing a duration limit,
>> scsi_check_sense() will set the newly introduced SCSI ML byte
>> SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in
>> scsi_noretry_cmd(), so that a command that failed because of a CDL
>> timeout cannot be retried. The SCSI ML byte is also checked in
>> scsi_result_to_blk_status() to complete the command request with the
>> BLK_STS_DURATION_LIMIT status, which result in the user seeing ETIME
>> errors for the failed commands.
>>
>> Co-developed-by: Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Niklas Cassel <niklas.cassel@xxxxxxx>
>> ---
>>   drivers/scsi/scsi_error.c | 46 +++++++++++++++++++++++++++++++++++++++
>>   drivers/scsi/scsi_lib.c   |  4 ++++
>>   drivers/scsi/scsi_priv.h  |  1 +
>>   3 files changed, 51 insertions(+)
>>
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index cf5ec5f5f4f6..9988539bc348 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -536,6 +536,7 @@ static inline void set_scsi_ml_byte(struct scsi_cmnd *cmd, u8 status)
>>    */
>>   enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>>   {
>> +	struct request *req = scsi_cmd_to_rq(scmd);
>>   	struct scsi_device *sdev = scmd->device;
>>   	struct scsi_sense_hdr sshdr;
>>   
>> @@ -595,6 +596,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>>   		if (sshdr.asc == 0x10) /* DIF */
>>   			return SUCCESS;
>>   
>> +		/*
>> +		 * Check aborts due to command duration limit policy:
>> +		 * ABORTED COMMAND additional sense code with the
>> +		 * COMMAND TIMEOUT BEFORE PROCESSING or
>> +		 * COMMAND TIMEOUT DURING PROCESSING or
>> +		 * COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR RECOVERY
>> +		 * additional sense code qualifiers.
>> +		 */
>> +		if (sshdr.asc == 0x2e &&
>> +		    sshdr.ascq >= 0x01 && sshdr.ascq <= 0x03) {
>> +			set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
>> +			req->cmd_flags |= REQ_FAILFAST_DEV;
>> +			req->rq_flags |= RQF_QUIET;
>> +			return SUCCESS;
>> +		}
>> +
>>   		if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF)
>>   			return ADD_TO_MLQUEUE;
>>   		if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 &&
>> @@ -691,6 +708,15 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>>   		}
>>   		return SUCCESS;
>>   
>> +	case COMPLETED:
>> +		if (sshdr.asc == 0x55 && sshdr.ascq == 0x0a) {
>> +			set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT);
>> +			req->cmd_flags |= REQ_FAILFAST_DEV;
>> +			req->rq_flags |= RQF_QUIET;
>> +			return SUCCESS;
> 
> You can kill this line, will be done anyway.
> 
>> +		}
>> +		return SUCCESS;
>> +
>>   	default:
>>   		return SUCCESS;
>>   	}
>> @@ -785,6 +811,14 @@ static enum scsi_disposition scsi_eh_completed_normally(struct scsi_cmnd *scmd)
>>   	switch (get_status_byte(scmd)) {
>>   	case SAM_STAT_GOOD:
>>   		scsi_handle_queue_ramp_up(scmd->device);
>> +		if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd))
>> +			/*
>> +			 * If we have sense data, call scsi_check_sense() in
>> +			 * order to set the correct SCSI ML byte (if any).
>> +			 * No point in checking the return value, since the
>> +			 * command has already completed successfully.
>> +			 */
>> +			scsi_check_sense(scmd);
> 
> I am every so slightly nervous here.
> We never checked the sense code for GOOD status, so heaven knows if 
> there are devices out there which return something here.
> And you have checked that we've cleared the sense code before submitting 
> (or retrying, even), right?

We'll double check that.


-- 
Damien Le Moal
Western Digital Research




[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux