From: Mike Christie <michaelc@xxxxxxxxxxx> Small guide to differences in iscsi and fc: I_T nexus or port we are connected to at some other end point iscsi: iscsi_session fc: rport fast_io_fail_tmo iscsi: session recovery_tmo fc: rport fast_io_fail_tmo The difference is that when the timer fires, for iscsi we unblock the queue and fail commands in the blocked queue. FC just fails IO running in the driver/fw/hw. The IO in the blocked queue sits there until dev_loss_tmo. dev_loss_tmo iscsi: none yet (we are working on it :)) fc: dev_loss_tmo Currently, if there is a transport problem the iscsi drivers will return outstanding commands (commands being exeucted by the driver/fw/hw) with DID_BUS_BUSY and block the session so no new commands can be queued. Commands that are caught between the failure handling and blocking are failed with DID_IMM_RETRY or one of the scsi ml queuecommand return values. When the recovery_timeout fires, the iscsi drivers then fail IO with DID_NO_CONNECT. For fcp, some drivers will fail some outstanding IO (disk but possibly not tape) with DID_BUS_BUSY or some other value that causes a retry and hits the scsi_error.c failfast check, block the rport, and commands caught in the race are failed with DID_IMM_RETRY. Other drivers, will hold onto all IO and wait for the terminate_rport_io or dev_loss_tmo_callbk to be called. In this case lpfc, could return the IO with DID_ERROR. The following patches attempt unify what upper layers will see drivers like multipath can make a good guess. This relies on drivers being hooked into their transport class and implementing the terminate_rport_io callback. This first patch just defines two new host byte errors so drivers can return the same value for when a rport/session is blocked and for when the fast_io_fail_tmo fires. The idea is that if the LLD/class detects a problem and is going to block a rport/session, then if the LLD wants or must return the command to scsi-ml, then it can return it with DID_TRANSPORT_BLOCKED. This will requeue the IO into the same scsi queue it came from. When using multipath and the fast_io_fail_tmo fires then the class can fail commands with DID_TRANSPORT_FAILFAST or drivers can use DID_TRANSPORT_FAILFAST in their terminate_rport_io callbacks or the equivlent in iscsi if we ever implement more advanced recovery methods. A LLD, like lpfc, could continue to return DID_ERROR and then it will hit the normal failfast path. The point of the patches is that upper layers will not see a failure that could be recovered from while the rport/session is blocked until fast_io_fail_tmo/recovery_timeout fires. Signed-off-by: Mike Christie <michaelc@xxxxxxxxxxx> --- drivers/scsi/constants.c | 3 ++- drivers/scsi/scsi_error.c | 15 ++++++++++++++- include/scsi/scsi.h | 2 ++ 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/constants.c b/drivers/scsi/constants.c index 2a458d6..9f0b284 100644 --- a/drivers/scsi/constants.c +++ b/drivers/scsi/constants.c @@ -1355,7 +1355,8 @@ #ifdef CONFIG_SCSI_CONSTANTS static const char * const hostbyte_table[]={ "DID_OK", "DID_NO_CONNECT", "DID_BUS_BUSY", "DID_TIME_OUT", "DID_BAD_TARGET", "DID_ABORT", "DID_PARITY", "DID_ERROR", "DID_RESET", "DID_BAD_INTR", -"DID_PASSTHROUGH", "DID_SOFT_ERROR", "DID_IMM_RETRY"}; +"DID_PASSTHROUGH", "DID_SOFT_ERROR", "DID_IMM_RETRY", "DID_TRANSPORT_BLOCKED", +"DID_TRANSPORT_FAILFAST" }; #define NUM_HOSTBYTE_STRS ARRAY_SIZE(hostbyte_table) static const char * const driverbyte_table[]={ diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index b8edcf5..7dbe0f4 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1231,7 +1231,20 @@ int scsi_decide_disposition(struct scsi_ case DID_REQUEUE: return ADD_TO_MLQUEUE; - + case DID_TRANSPORT_BLOCKED: + /* + * LLD/transport was disrupted during processing the IO. + * The transport is now blocked and attempting to recover, + * and the transport will decide what to do with the IO + * based on its timers and recovery capablilities. + */ + return NEEDS_RETRY; + case DID_TRANSPORT_FAILFAST: + /* + * The transport decided to failfast the IO (most likely + * the fast io fail tmo fired), so send IO directly upwards. + */ + return SUCCESS; case DID_ERROR: if (msg_byte(scmd->result) == COMMAND_COMPLETE && status_byte(scmd->result) == RESERVATION_CONFLICT) diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h index 5c0e979..fa94d65 100644 --- a/include/scsi/scsi.h +++ b/include/scsi/scsi.h @@ -309,6 +309,8 @@ #define DID_SOFT_ERROR 0x0b /* The low #define DID_IMM_RETRY 0x0c /* Retry without decrementing retry count */ #define DID_REQUEUE 0x0d /* Requeue command (no immediate retry) also * without decrementing the retry count */ +#define DID_TRANSPORT_BLOCKED 0x0e /* Transport class will handle io */ +#define DID_TRANSPORT_FAILFAST 0x0f /* Transport class fastfailed the io */ #define DRIVER_OK 0x00 /* Driver status */ /* -- 1.4.1.1 - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html