[PATCH] make error handling robust in the face of reservations

James Bottomley <James.Bottomley@xxxxxxx> · Fri, 06 Aug 2010 15:17:24 -0500

There's a curious case where devices in clusters are offlining if they
go into error handling.  The reason is that in this particular cluster,
Test Unit Ready gets a RESERVATION CONFLICT return when another node
owns the storage.  This means that all TURs that error handling use are
marked failed, so we always assume the device is unrecoverable and take
it offline.

Fix this by checking in the error handling code processing returns to
see if the command was a TUR and translate the EH return to SUCCESS
(after all, if the target managed to return RESERVATION CONFLICT, we've
successfully made contact with it).

James

---

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 2bf9846..5e2d36f 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -473,10 +473,12 @@ static int scsi_eh_completed_normally(struct scsi_cmnd *scmd)
 		 */
 		return SUCCESS;
 	case RESERVATION_CONFLICT:
-		/*
-		 * let issuer deal with this, it could be just fine
-		 */
-		return SUCCESS;
+		if (scmd->cmnd[0] == TEST_UNIT_READY)
+			/* it is a success, we probed the device and
+			 * found it */
+			return SUCCESS;
+		/* otherwise, we failed to send the command */
+		return FAILED;
 	case QUEUE_FULL:
 		scsi_handle_queue_full(scmd->device);
 		/* fall through */


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html