On 1/28/23 00:34, Hannes Reinecke wrote: > On 1/24/23 20:02, Niklas Cassel wrote: >> Commands using a duration limit descriptor that has limit policies set >> to a value other than 0x0 may be failed by the device if one of the >> limits are exceeded. For such commands, since the failure is the result >> of the user duration limit configuration and workload, the commands >> should not be retried and terminated immediately. Furthermore, to allow >> the user to differentiate these "soft" failures from hard errors due to >> hardware problem, a different error code than EIO should be returned. >> >> There are 2 cases to consider: >> (1) The failure is due to a limit policy failing the command with a >> check condition sense key, that is, any limit policy other than 0xD. >> For this case, scsi_check_sense() is modified to detect failures with >> the ABORTED COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING >> or COMMAND TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING >> PROCESSING DUE TO ERROR RECOVERY additional sense code. For these >> failures, a SUCCESS disposition is returned so that >> scsi_finish_command() is called to terminate the command. >> >> (2) The failure is due to a limit policy set to 0xD, which result in the >> command being terminated with a GOOD status, COMPLETED sense key, and >> DATA CURRENTLY UNAVAILABLE additional sense code. To handle this case, >> the scsi_check_sense() is modified to return a SUCCESS disposition so >> that scsi_finish_command() is called to terminate the command. >> In addition, scsi_decide_disposition() has to be modified to see if a >> command being terminated with GOOD status has sense data. >> This is as defined in SCSI Primary Commands - 6 (SPC-6), so all >> according to spec, even if GOOD status commands were not checked before. >> >> If scsi_check_sense() detects sense data representing a duration limit, >> scsi_check_sense() will set the newly introduced SCSI ML byte >> SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in >> scsi_noretry_cmd(), so that a command that failed because of a CDL >> timeout cannot be retried. The SCSI ML byte is also checked in >> scsi_result_to_blk_status() to complete the command request with the >> BLK_STS_DURATION_LIMIT status, which result in the user seeing ETIME >> errors for the failed commands. >> >> Co-developed-by: Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> >> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> >> Signed-off-by: Niklas Cassel <niklas.cassel@xxxxxxx> >> --- >> drivers/scsi/scsi_error.c | 46 +++++++++++++++++++++++++++++++++++++++ >> drivers/scsi/scsi_lib.c | 4 ++++ >> drivers/scsi/scsi_priv.h | 1 + >> 3 files changed, 51 insertions(+) >> >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c >> index cf5ec5f5f4f6..9988539bc348 100644 >> --- a/drivers/scsi/scsi_error.c >> +++ b/drivers/scsi/scsi_error.c >> @@ -536,6 +536,7 @@ static inline void set_scsi_ml_byte(struct scsi_cmnd *cmd, u8 status) >> */ >> enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) >> { >> + struct request *req = scsi_cmd_to_rq(scmd); >> struct scsi_device *sdev = scmd->device; >> struct scsi_sense_hdr sshdr; >> >> @@ -595,6 +596,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) >> if (sshdr.asc == 0x10) /* DIF */ >> return SUCCESS; >> >> + /* >> + * Check aborts due to command duration limit policy: >> + * ABORTED COMMAND additional sense code with the >> + * COMMAND TIMEOUT BEFORE PROCESSING or >> + * COMMAND TIMEOUT DURING PROCESSING or >> + * COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR RECOVERY >> + * additional sense code qualifiers. >> + */ >> + if (sshdr.asc == 0x2e && >> + sshdr.ascq >= 0x01 && sshdr.ascq <= 0x03) { >> + set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT); >> + req->cmd_flags |= REQ_FAILFAST_DEV; >> + req->rq_flags |= RQF_QUIET; >> + return SUCCESS; >> + } >> + >> if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF) >> return ADD_TO_MLQUEUE; >> if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 && >> @@ -691,6 +708,15 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) >> } >> return SUCCESS; >> >> + case COMPLETED: >> + if (sshdr.asc == 0x55 && sshdr.ascq == 0x0a) { >> + set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT); >> + req->cmd_flags |= REQ_FAILFAST_DEV; >> + req->rq_flags |= RQF_QUIET; >> + return SUCCESS; > > You can kill this line, will be done anyway. > >> + } >> + return SUCCESS; >> + >> default: >> return SUCCESS; >> } >> @@ -785,6 +811,14 @@ static enum scsi_disposition scsi_eh_completed_normally(struct scsi_cmnd *scmd) >> switch (get_status_byte(scmd)) { >> case SAM_STAT_GOOD: >> scsi_handle_queue_ramp_up(scmd->device); >> + if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd)) >> + /* >> + * If we have sense data, call scsi_check_sense() in >> + * order to set the correct SCSI ML byte (if any). >> + * No point in checking the return value, since the >> + * command has already completed successfully. >> + */ >> + scsi_check_sense(scmd); > > I am every so slightly nervous here. > We never checked the sense code for GOOD status, so heaven knows if > there are devices out there which return something here. > And you have checked that we've cleared the sense code before submitting > (or retrying, even), right? We'll double check that. -- Damien Le Moal Western Digital Research