Re: [PATCH RESEND] scsi: scan: retry INQUIRY after timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/8/22 3:20 PM, mwilck@xxxxxxxx wrote:
> From: Martin Wilck <mwilck@xxxxxxxx>
> 
> The SCSI mid layer doesn't retry commands after DID_TIME_OUT (see
> scsi_noretry_cmd()). Packet loss in the fabric can cause spurious timeouts
> during SCSI device probing, causing device probing to fail. This has been
> observed in FCoE uplink failover tests, for example.

What about the other scan/probe related commands and other transient transport
errors like this (so when we get to the point DID_TRANSPORT_DISRUPTED is returned)?
I think if you changed your test a little so the fc port state changed, we could
still hit the same end problem. We can hit similar errors with iscsi and plain old
FC.

For REPORT_LUNS it looks like we retry almost all errors 3 times. For the
probe/setup commands, at least for disks, it looks like we also are more
forgiving and will retry DID_TIME_OUT/DID_TRANSPORT_DISRUPTED 3 times for
commands like SAI_READ_CAPACITY_16 (I didn't check every sd operation and
other upper level drivers).

However, for the other probe/setup  operations that rely on scsi_attach_vpd
succeeding like sd_read_block_limits then we will hit issues where the device
is partially setup. Should scsi_vpd_inquiry be retrying 3 times as well?

An alternative to changing all the callers would be we could make scsi_noretry_cmd
detect when it's an internal passthrough command and just retry these types of
errors. For SG IO type of passthough we still want to fail right away.

> 
> This patch fixes the issue by retrying the INQUIRY up to 3 times (in practice,
> we never observed more than a single retry),
> 
> Signed-off-by: Martin Wilck <mwilck@xxxxxxxx>
> Tested-by: Dave Prizer <dave.prizer@xxxxxxx>
> 
> ---
> This patch was previously part of the series "Fixes for device probing
> on flaky connections", submitted on 2022/06/15. The first patch of the
> series has been dropped as discussed in the review process. Testing
> verified that just this patch was sufficient to solve the observed
> issues.
> 
> ---
>  drivers/scsi/scsi_scan.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 91ac901a66826..e859a648033f9 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -697,6 +697,11 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
>  				    (sshdr.ascq == 0))
>  					continue;
>  			}
> +			if (host_byte(result) == DID_TIME_OUT) {
> +				SCSI_LOG_SCAN_BUS(3, sdev_printk(KERN_INFO, sdev,
> +						"scsi scan: retry inquiry after timeout\n"));
> +				continue;
> +			}
>  		} else if (result == 0) {
>  			/*
>  			 * if nothing was transferred, we try

Should there 




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux