Re: RFC: Retrying SCSI pass-through commands

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024/08/03 13:42, Mike Christie wrote:
> On 7/31/24 3:22 PM, Bart Van Assche wrote:
>>
>>
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index da7dac77f8cd..e21becc5bcf9 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -1816,6 +1816,12 @@ bool scsi_noretry_cmd(struct scsi_cmnd *scmd)
>>          * assume caller has checked sense and determined
>>          * the check condition was retryable.
>>          */
>> -       if (req->cmd_flags & REQ_FAILFAST_DEV || blk_rq_is_passthrough(req))
>> -               return true;
>> +       if (req->cmd_flags & REQ_FAILFAST_DEV)
>> +               return true;
>> +       if (/* submitted by the SCSI core */)
>> +               return false;
>> +       if (blk_rq_is_passthrough(req))
>> +               return true;
> 
> Do you just want to retry UAs or all internal passthrough command
> errors that go through here?

I think that we definitely do *not* want to retry all internal commands. E.g. I
do not see the point in retrying INQUIRY or VPD page accesses failing on device
scan.

> 
> For just the specific UA or NOT_READY/0x04/0x01 case for an example.
> Does every scsi core passthrough caller want that retried? If so, then
> I can see where you are coming from where you feel it's a bug. I would
> agree we would normally want to retry that in general. Maybe others know
> about some specific old case though.

I think the command Bart had a problem with is START_STOP_UNIT. If that is the
only case causing problems, then we should probably improve
sd_start_stop_device() to allow retries. But if there are more commands that
needs the same, then the patch that Bart is proposing makes sense I think, but
it will require an audit of all REQ_OP_DRV_XXX internal users to make sure that
they all use that command with -> allowed set to 0. But hopefully, most cases go
through scsi_execute_cmnd(), which should simplify this audit.

> 
> However, I'm not sure for MEDIUM_ERROR or ABORTED_COMMAND.
> I think MEDIUM_ERROR probably would not come up for the cases we are
> talking about though.
> 
> I don't think we want to always retry DID_TIME_OUT though. The funny
> thing is that I think Martin just wanted to retry one specific case
> for that error. We had to do the scsi_check_passthrough patches so
> we could retry just for scsi_probe_lun.

-- 
Damien Le Moal
Western Digital Research





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux