Re: ALUA for HA failover and multipathd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/10/2014 11:11 AM, Chris Boot wrote:
On 06/06/14 07:34, Nicholas A. Bellinger wrote:
Hi Chris & Phillip,

On Thu, 2014-06-05 at 15:31 +0100, Chris Boot wrote:
Hi folks,

Background: I'm working on creating a Highly Available iSCSI system
using Pacemaker with some collaborators. We looked at the existing
iSCSILogicalUnit and iSCSITarget resource scripts, and they didn't seem
to do quite what we wanted so we have started down the route of writing
our own. Our new scripts are GPL and their current incarnations are
available at https://github.com/tigercomputing/ocf-lio

In general terms the setup is reasonably simple: we have a DRBD volume
running in dual-primary mode, which is then used to create an iblock
device, which itself is exported over iSCSI. We have been attempting to
use ALUA multipathing in implicit mode only to manage target failover.

We create two ALUA TPGs on each node, call them east/west, and mark one
as Active/Optimised and the other as Active/NonOptimised. When we create
the iSCSI TPGs on both nodes, one node's TPG is placed in the west ALUA
TPG and the other node's is placed into the east ALUA TPG.

When simulating a fail-over, the ALUA states on both east and west are
changed on both nodes and kept synchronised.

What we see when using multipathd on Linux as the initiator all appears
to work well until we switch roles on the target. multipathd seems to
stick to the old path, even though it is now NonOptimised and running
slowly due to the 100ms nonop_delay_msecs.

If, instead, we set the standby path to Standby mode rather than
Active/NonOptimised, multipathd correctly notices the path is
unavailable and sends IO over the Active/Optimised path. However, if the
initiator originally logs-in to the target while the path is in Standby
mode, it fails to correctly probe the device. When it becomes
Active/Optimised during failover, multipathd is unable to use it and
fails the path. The TUR checker returns that the path is active, though,
and makes the path active again, only to be failed again etc... The only
way to bring it back to life is to "echo 1 >
/sys/block/$DEV/device/rescan" and re-run multipath by hand.

I haven't been able to test this myself, but Philip (CCed) reports that
similar behaviour is seen using VMware as the initiator rather than Linux.

Has anyone managed to set up an ALUA multipath HA SAN with two nodes and
LIO? What are we missing? Am I going to have to throw in the towel on
ALUA and just use virtual IP failover instead?

After testing this evening with similar config on a single target
instance, the issue where initial LUN probe failures occur on a ALUA
group set implicitly to Standby state is reproducible..

The failure occurs during the initial READ_CAPACITY, which is currently
disallowed in opcode checking within core_alua_state_standby() code.  I
thought at one point READ_CAPACITY could fail during initial LUN probe
and still bring up a struct scsi_device with a zero number of sectors,
but could be wrong..? (Hannes CC'ed)

In any event, the following patch to permit READ_CAPACITY addresses the
initial LUN probe failure and works on my end, and should allow implicit
ALUA Active/* <-> Standby + vice versa state change to function now.

Please confirm with your setup.

Hi Nab,

Ack, this fixes the issue completely for me under Linux with multipathd.
The standby path is correctly probed now when you login to the target,
and when you fail-over to it everything carries on. Thanks very much!

Note that I tested on 3.14 so had to replace set_ascq() with *alua_ascq
as you did for the stable patches.

Tested-by: Chris Boot <crb@xxxxxxxxxxxxxxxxxxxxx>

FWIW, the kernel messages we obtain when probing the disk look like:

[  388.929254] scsi12 : iSCSI Initiator over TCP/IP
[  389.184537] scsi 12:0:0:0: Direct-Access     LIO-ORG  test1
   4.0  PQ: 0 ANSI: 5
[  389.184632] scsi 12:0:0:0: alua: supports implicit TPGS
[  389.185229] scsi 12:0:0:0: alua: port group 11 rel port 01
[  389.185390] scsi 12:0:0:0: alua: port group 11 state S non-preferred
supports TOlUSNA
[  389.185393] scsi 12:0:0:0: alua: Attached
[  389.185791] sd 12:0:0:0: Attached scsi generic sg5 type 0
[  389.186499] sd 12:0:0:0: [sde] 2147418040 512-byte logical blocks:
(1.09 TB/1023 GiB)
[  389.187254] sd 12:0:0:0: [sde] Write Protect is off
[  389.187258] sd 12:0:0:0: [sde] Mode Sense: 43 00 10 08
[  389.188301] sd 12:0:0:0: [sde] Write cache: enabled, read cache:
enabled, supports DPO and FUA
[  389.190677] ldm_validate_partition_table(): Disk read failed.
[  389.190705] Dev sde: unable to read RDB block 0
[  389.190734]  sde: unable to read partition table
[  389.192784] sd 12:0:0:0: [sde] Attached SCSI disk
[  389.246318] sd 10:0:0:0: alua: port group 10 state A preferred
supports TOlUSNA
[  389.325327] sd 10:0:0:0: alua: port group 10 state A preferred
supports TOlUSNA

Thanks for getting a patch to us so quickly, and sorry it took so long
to get it tested.

Thanks!

--nab

diff --git a/drivers/target/target_core_alua.c b/drivers/target/target_core_alua.c
index fcbe612..63512cc 100644
--- a/drivers/target/target_core_alua.c
+++ b/drivers/target/target_core_alua.c
@@ -576,7 +576,16 @@ static inline int core_alua_state_standby(
  	case REPORT_LUNS:
  	case RECEIVE_DIAGNOSTIC:
  	case SEND_DIAGNOSTIC:
+	case READ_CAPACITY:
  		return 0;
+	case SERVICE_ACTION_IN:
+		switch (cdb[1] & 0x1f) {
+		case SAI_READ_CAPACITY_16:
+			return 0;
+		default:
+			set_ascq(cmd, ASCQ_04H_ALUA_TG_PT_STANDBY);
+			return 1;
+		}
  	case MAINTENANCE_IN:
  		switch (cdb[1] & 0x1f) {
  		case MI_REPORT_TARGET_PGS:

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Hmm. While I agree with the patch (and confirm that it's required to get multipath working on LIO-target), it really seems that multipath is making incorrect assumptions here. Looking at the spec READ_CAPACITY is indeed not required to be supported for STANDBY paths, so multipath will fail for any ALUA implementations following the spec more closely.

Guess we need to discuss this on dm-devel ...

Cheers,

Hannes
--
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux