On Wed, 2022-09-28 at 21:53 -0500, Mike Christie wrote: > This simplifies sd_spinup_disk so scsi-ml retries errors for it. Note > that > we retried specifically on a UA and also if scsi_status_is_good > returned > failed which could happen for all check conditions, so in this patch > we > don't check for only UAs. > > We do not handle the outside loop's retries because we want to sleep > between tried and we don't support that yet. > > Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx> > --- > drivers/scsi/sd.c | 76 ++++++++++++++++++++++++++++----------------- > -- > 1 file changed, 45 insertions(+), 31 deletions(-) > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index a35c089c3097..716e0c8ffa57 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -2064,50 +2064,64 @@ sd_spinup_disk(struct scsi_disk *sdkp) > { > unsigned char cmd[10]; > unsigned long spintime_expire = 0; > - int retries, spintime; > + int spintime; > unsigned int the_result; > struct scsi_sense_hdr sshdr; > int sense_valid = 0; > + struct scsi_failure failures[] = { > + { > + .sense = SCMD_FAILURE_SENSE_ANY, > + .asc = SCMD_FAILURE_ASC_ANY, > + .ascq = SCMD_FAILURE_ASCQ_ANY, > + .allowed = 3, > + .result = SAM_STAT_CHECK_CONDITION, > + }, > + { > + /* > + * Retry scsi status and host errors that > return > + * failure in scsi_status_is_good. > + */ > + .result = SAM_STAT_BUSY | > + SAM_STAT_RESERVATION_CONFLICT | > + SAM_STAT_TASK_SET_FULL | > + SAM_STAT_ACA_ACTIVE | > + SAM_STAT_TASK_ABORTED | I fail to understand how bitwise-or would work here. IIUC, this tries to replicate the logic to retry if (!scsi_status_is_good()). The result of this bitwise-or operation is 0x78, which matches all SAM_ codes except SAM_STAT_GOOD, SAM_STAT_CHECK_CONDITION and SAM_STAT_CONDITION_MET. SAM_STAT_CHECK_CONDITION is covered by the first failure. But unless I'm mistaken, we'd now retry on SAM_STAT_INTERMEDIATE, SAM_STAT_INTERMEDIATE_CONDITION_MET, and SAM_STAT_COMMAND_TERMINATED, on which the old code did not retry. Am I overlooking something? At least this deserves an in-depth comment; in general, as noted for patch 02/35, I find using bitwise or for SAM status counter- intuitive. > + DID_NO_CONNECT << 16, Shouldn't .allowed be set to 3 here? OTOH that would cause the number of retries to add up to 6, see my reply to 22/35. But don't see what effect a failure with allowed = 0 would have. Regards Martin