Re: [PATCH] scsi: libiscsi: Allow sd_shutdown on bad transport

Lee Duncan <lduncan@xxxxxxxx> · Tue, 2 Jan 2018 20:27:52 -0800



On 12/07/2017 01:59 PM, Rafael David Tinoco wrote:
> If, for any reason, userland shuts down iscsi transport interfaces
> before proper logouts - like when logging in to LUNs manually,
> without logging out on server shutdown, or when automated scripts
> can't umount/logout from logged LUNs - kernel will hang forever on
> its sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd
> to all still existent paths.
> 
> PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
>  #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
>  #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
>  #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
>  #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
>  #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
>  #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
>  #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
>  #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
>  #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
>  #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c
> 
> This happens because iscsi_eh_cmd_timed_out(), the transport layer
> timeout helper, would tell the queue timeout function (scsi_times_out)
> to reset the request timer over and over, until the session state is
> back to logged in state. Unfortunately, during server shutdown, this
> might never happen again.
> 
> Other option would be "not to handle" the issue in the transport
> layer. That would trigger the error handler logic, which would also
> need the session state to be logged in again.
> 
> Best option, for such case, is to tell upper layers that the command
> was handled during the transport layer error handler helper, marking
> it as DID_NO_CONNECT, which will allow completion and inform about
> the problem.
> 
> After the session was marked as ISCSI_STATE_FAILED, due to the first
> timeout during the server shutdown phase, all subsequent cmds will
> fail to be queued, allowing upper logic to fail faster.
> 
> Signed-off-by: Rafael David Tinoco <rafael.tinoco@xxxxxxxxxxxxx>
> ---
>  drivers/scsi/libiscsi.c | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 9c50d2d9f27c..785d1c55d152 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1696,6 +1696,15 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc)
>  		 */
>  		switch (session->state) {
>  		case ISCSI_STATE_FAILED:
> +			/*
> +			 * cmds should fail during shutdown, if the session
> +			 * state is bad, allowing completion to happen
> +			 */
> +			if (unlikely(system_state != SYSTEM_RUNNING)) {
> +				reason = FAILURE_SESSION_FAILED;
> +				sc->result = DID_NO_CONNECT << 16;
> +				break;
> +			}
>  		case ISCSI_STATE_IN_RECOVERY:
>  			reason = FAILURE_SESSION_IN_RECOVERY;
>  			sc->result = DID_IMM_RETRY << 16;
> @@ -1978,6 +1987,19 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
>  	}
>  
>  	if (session->state != ISCSI_STATE_LOGGED_IN) {
> +		/*
> +		 * During shutdown, if session is prematurely disconnected,
> +		 * recovery won't happen and there will be hung cmds. Not
> +		 * handling cmds would trigger EH, also bad in this case.
> +		 * Instead, handle cmd, allow completion to happen and let
> +		 * upper layer to deal with the result.
> +		 */
> +		if (unlikely(system_state != SYSTEM_RUNNING)) {
> +			sc->result = DID_NO_CONNECT << 16;
> +			ISCSI_DBG_EH(session, "sc on shutdown, handled\n");
> +			rc = BLK_EH_HANDLED;
> +			goto done;
> +		}
>  		/*
>  		 * We are probably in the middle of iscsi recovery so let
>  		 * that complete and handle the error.
> @@ -2082,7 +2104,7 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
>  		task->last_timeout = jiffies;
>  	spin_unlock(&session->frwd_lock);
>  	ISCSI_DBG_EH(session, "return %s\n", rc == BLK_EH_RESET_TIMER ?
> -		     "timer reset" : "nh");
> +		     "timer reset" : "shutdown or nh");
>  	return rc;
>  }
>  EXPORT_SYMBOL_GPL(iscsi_eh_cmd_timed_out);
> 

Reviewed-by: Lee Duncan <lduncan@xxxxxxxx>

-- 
Lee