[PATCH RESEND] scsi: scan: retry INQUIRY after timeout

mwilck@xxxxxxxx · Mon, 8 Aug 2022 22:20:18 +0200

From: Martin Wilck <mwilck@xxxxxxxx>

The SCSI mid layer doesn't retry commands after DID_TIME_OUT (see
scsi_noretry_cmd()). Packet loss in the fabric can cause spurious timeouts
during SCSI device probing, causing device probing to fail. This has been
observed in FCoE uplink failover tests, for example.

This patch fixes the issue by retrying the INQUIRY up to 3 times (in practice,
we never observed more than a single retry),

Signed-off-by: Martin Wilck <mwilck@xxxxxxxx>
Tested-by: Dave Prizer <dave.prizer@xxxxxxx>

---
This patch was previously part of the series "Fixes for device probing
on flaky connections", submitted on 2022/06/15. The first patch of the
series has been dropped as discussed in the review process. Testing
verified that just this patch was sufficient to solve the observed
issues.

---
 drivers/scsi/scsi_scan.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 91ac901a66826..e859a648033f9 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -697,6 +697,11 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
 				    (sshdr.ascq == 0))
 					continue;
 			}
+			if (host_byte(result) == DID_TIME_OUT) {
+				SCSI_LOG_SCAN_BUS(3, sdev_printk(KERN_INFO, sdev,
+						"scsi scan: retry inquiry after timeout\n"));
+				continue;
+			}
 		} else if (result == 0) {
 			/*
 			 * if nothing was transferred, we try
-- 
2.37.1