[RFC 8/9] zfcp: fix waiting for rport(s) unblock in eh_host_reset_handler

Steffen Maier <maier@xxxxxxxxxxxxxxxxxx> · Tue, 25 Jul 2017 16:14:26 +0200

v2.6.30 commit 63caf367e1c9 ("[SCSI] zfcp: Improve reliability of SCSI eh
handlers in zfcp") added calls to zfcp_erp_wait() within
eh_abort_handler(), eh_device_reset_handler(), eh_target_reset_handler()
in order to synchronize with zfcp recovery completion before returning
from a scsi_eh callback (e.g. with SUCCESS) to prevent eh escalation.

v2.6.33 commit af4de36d911a ("[SCSI] zfcp: Block scsi_eh thread for rport
state BLOCKED") introduced the use of fc_block_scsi_eh() for
eh_abort_handler(), eh_device_reset_handler(), eh_target_reset_handler(),
and eh_host_reset_handler(), because zfcp_erp_wait() from above commit is
not sufficient.
The use in zfcp_task_mgmt_function() is correct even for a LUN reset,
as described in commit 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race
with LUN recovery").
However, the one call in zfcp_scsi_eh_host_reset_handler() waiting for
just one arbitrary port of the arbitrary scsi_cmnd seems insufficient
as the preceding adapter recovery could have recovered multiple ports
for which we all should wait to unblock (or have run into FAST_IO_FAIL).

Therefore, we now wait for all ports of the adapter with this fix.

NB: We cannot easily wait for an event because there is a time window
between zfcp_erp_wait() returned and zfcp_erp_try_rport_unblock() as part
of zfcp_erp_action_cleanup() actually scheduled rport_work which will
unblock an rport in zfcp_scsi_rport_work() asynchronously. Hence a
flush_work() could come early before queue_work() was even done.

v2.6.35 commit a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from
fc_block_scsi_eh to scsi eh") fixed v2.6.33 for the FAST_IO_FAIL case.

Signed-off-by: Steffen Maier <maier@xxxxxxxxxxxxxxxxxx>
Fixes: af4de36d911a ("[SCSI] zfcp: Block scsi_eh thread for rport state BLOCKED")
Fixes: a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
---
 drivers/s390/scsi/zfcp_scsi.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
index 8e96196fa877..11cf33ea8c14 100644
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -338,16 +338,29 @@ static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
 	struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scpnt->device);
 	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
 	struct zfcp_port *port;
-	int ret;
+	int ret = SUCCESS;
 
 	zfcp_erp_adapter_reopen(adapter, 0, "schrh_1");
 	zfcp_erp_wait(adapter);
-	port = zfcp_sdev->port;
-	ret = port->rport ? fc_block_rport(port->rport) : 0;
-	if (ret)
-		return ret;
+	/* after internal recovery, wait for async unblock of rport(s) */
+	read_lock(&adapter->port_list_lock);
+	list_for_each_entry(port, &adapter->port_list, list) {
+		int fc_ret;
+
+		if (!port->rport)
+			continue;
+
+		fc_ret = fc_block_rport(port->rport);
+		/* Any rport ran into fast_io_fail_tmo: FAST_IO_FAIL.
+		 * To let pending requests bubble up, even if too many
+		 * because of other rports without this timeout.
+		 */
+		if (fc_ret)
+			ret = fc_ret;
+	}
+	read_unlock(&adapter->port_list_lock);
 
-	return SUCCESS;
+	return ret;
 }
 
 struct scsi_transport_template *zfcp_scsi_transport_template;
-- 
2.11.2