Re: [PATCH 1/4] scsi: scsi_dh_alua: allow I/O in the target port unavailable state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2017-04-10 at 22:17 -0300, Mauricio Faria de Oliveira wrote:
> According to SPC-4 (5.15.2.4.5 Unavailable state), the unavailable
> state may (or may not) transition to other states (e.g., microcode
> downloading or hardware error, which may be temporary or permanent
> conditions, respectively).
> 
> But, scsi_dh_alua currently fails the I/O requests early once that
> state is established (in alua_prep_fn()), which provides no chance
> for path checkers going through that function path to really check
> whether the path actually still fails I/O requests or recovered to
> an active state.
> 
> This might cause device-mapper multipath to fail all paths to some
> storage system that moves the controllers to the unavailable state
> for firmware upgrades, and never recover regardless of the storage
> system doing upgrades one controller at a time and get them online.
> 
> Then I/O requests are blocked indefinitely due to queue_if_no_path
> but the underlying individual paths are fully operational, and can
> be verified as such through other function paths (e.g., SG_IO):
> 
>     # multipath -l
>     mpatha (360050764008100dac000000000000100) dm-0 IBM,2145
>     size=40G features='2 queue_if_no_path retain_attached_hw_handler'
>     hwhandler='1 alua' wp=rw
>     |-+- policy='service-time 0' prio=0 status=enabled
>     | |- 1:0:1:0 sdf 8:80  failed undef running
>     | `- 2:0:1:0 sdn 8:208 failed undef running
>     `-+- policy='service-time 0' prio=0 status=enabled
>       |- 1:0:0:0 sdb 8:16  failed undef running
>       `- 2:0:0:0 sdj 8:144 failed undef running
> 
>     # strace -e read \
>         sg_dd if=/dev/sdj of=/dev/null bs=512 count=1 iflag=direct \
>         2>&1 | grep 512
>     read(3, 0x3fff7ba80000, 512) = -1 EIO (Input/output error)
> 
>     # strace -e ioctl \
>         sg_dd if=/dev/sdj of=/dev/null bs=512 count=1 iflag=direct \
>         blk_sgio=1 \
>         2>&1 | grep 512
>     ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[10]=[28, 00, 00, 00,
>     00, 00, 00, 00, 01, 00], <...>) = 0
> 
> So, allow I/O to target port (groups) in the unavailable state, so the
> path checkers can actually check them, and schedule a recheck whenever
> the unavailable state is detected so pg->state can be updated properly
> (and further SCSI IO error messages then silenced through alua_prep_fn()).
> 
> Once a path checker eventually detects an active state again, the port
> group state will be updated by the path activation call, alua_activate(),
> as it schedules an alua_rtpg() check.
> 
> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@xxxxxxxxxxxxxxxxxx>
> Reported-by: Naresh Bannoth <nbannoth@xxxxxxxxxx>
> ---
>  drivers/scsi/device_handler/scsi_dh_alua.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> index c01b47e5b55a..5e5a33cac951 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -431,6 +431,20 @@ static int alua_check_sense(struct scsi_device *sdev,
>  			alua_check(sdev, false);
>  			return NEEDS_RETRY;
>  		}
> +		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0c) {
> +			/*
> +			 * LUN Not Accessible - target port in unavailable state.
> +			 *
> +			 * It may (not) be possible to transition to other states;
> +			 * the transition might take a while or not happen at all,
> +			 * depending on the storage system model, error type, etc.
> +			 *
> +			 * Do not retry, so failover to another target port occur.
> +			 * Schedule a recheck to update state for other functions.
> +			 */
> +			alua_check(sdev, true);
> +			return SUCCESS;
> +		}
>  		break;
>  	case UNIT_ATTENTION:
>  		if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) {
> @@ -1057,6 +1071,8 @@ static void alua_check(struct scsi_device *sdev, bool force)
>   *
>   * Fail I/O to all paths not in state
>   * active/optimized or active/non-optimized.
> + * Allow I/O to all paths in state unavailable
> + * so path checkers can actually check them.
>   */
>  static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
>  {
> @@ -1072,6 +1088,8 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
>  	rcu_read_unlock();
>  	if (state == SCSI_ACCESS_STATE_TRANSITIONING)
>  		ret = BLKPREP_DEFER;
> +	else if (state == SCSI_ACCESS_STATE_UNAVAILABLE)
> +		req->rq_flags |= RQF_QUIET;
>  	else if (state != SCSI_ACCESS_STATE_OPTIMAL &&
>  		 state != SCSI_ACCESS_STATE_ACTIVE &&
>  		 state != SCSI_ACCESS_STATE_LBA) {

Hello Mauricio,

Please also add support for the "standby" state to both alua_check_sense()
and alua_prep_fn() while you are modifying these functions.

Thanks,

Bart.



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux