This is a note to let you know that I've just added the patch titled IB/srp: Fail I/O requests if the transport is offline to the 3.8-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: ib-srp-fail-i-o-requests-if-the-transport-is-offline.patch and it can be found in the queue-3.8 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From 2ce19e72f4d570c87e025ee6fca4eae699a8b712 Mon Sep 17 00:00:00 2001 From: Bart Van Assche <bvanassche@xxxxxxx> Date: Thu, 21 Feb 2013 17:20:00 +0000 Subject: IB/srp: Fail I/O requests if the transport is offline From: Bart Van Assche <bvanassche@xxxxxxx> commit 2ce19e72f4d570c87e025ee6fca4eae699a8b712 upstream. If an SRP target is no longer reachable and srp_reset_host() fails to reconnect then ib_srp will invoke scsi_remove_host(). That function will invoke __scsi_remove_device() for each LUN. And that last function will change the device state from SDEV_TRANSPORT_OFFLINE into SDEV_CANCEL. Certain user space software, e.g. older versions of multipathd, continue queueing I/O to SCSI devices that are in the SDEV_CANCEL state. If these I/O requests are submitted as SG_IO that means that the REQ_PREEMPT flag will be set and hence that these requests will be passed to srp_queuecommand(). These requests will time out. If new requests are queued fast enough from user space these active requests will prevent __scsi_remove_device() to finish. Avoid this by failing I/O requests in the SDEV_CANCEL state if the transport is offline. Introduce a new variable to keep track of the transport state instead of failing requests if (!target->connected || target->qp_in_error), so that the SCSI error handler has a chance to retry commands after a transport layer failure occurred. Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> Signed-off-by: Roland Dreier <roland@xxxxxxxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- drivers/infiniband/ulp/srp/ib_srp.c | 7 +++++++ drivers/infiniband/ulp/srp/ib_srp.h | 1 + 2 files changed, 8 insertions(+) --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -734,6 +734,7 @@ static int srp_reconnect_target(struct s scsi_target_unblock(&shost->shost_gendev, ret == 0 ? SDEV_RUNNING : SDEV_TRANSPORT_OFFLINE); + target->transport_offline = !!ret; if (ret) goto err; @@ -1353,6 +1354,12 @@ static int srp_queuecommand(struct Scsi_ unsigned long flags; int len; + if (unlikely(target->transport_offline)) { + scmnd->result = DID_NO_CONNECT << 16; + scmnd->scsi_done(scmnd); + return 0; + } + spin_lock_irqsave(&target->lock, flags); iu = __srp_get_tx_iu(target, SRP_IU_CMD); if (!iu) --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -140,6 +140,7 @@ struct srp_target_port { unsigned int cmd_sg_cnt; unsigned int indirect_size; bool allow_ext_sg; + bool transport_offline; /* Everything above this point is used in the hot path of * command processing. Try to keep them packed into cachelines. Patches currently in stable-queue which might be from bvanassche@xxxxxxx are queue-3.8/ib-srp-avoid-sending-a-task-management-function-needlessly.patch queue-3.8/ib-srp-avoid-endless-scsi-error-handling-loop.patch queue-3.8/ib-srp-fail-i-o-requests-if-the-transport-is-offline.patch queue-3.8/ib-srp-track-connection-state-properly.patch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html