Re: ALUA for HA failover and multipathd

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Thu, 05 Jun 2014 23:34:51 -0700

Hi Chris & Phillip,

On Thu, 2014-06-05 at 15:31 +0100, Chris Boot wrote:
> Hi folks,
> 
> Background: I'm working on creating a Highly Available iSCSI system
> using Pacemaker with some collaborators. We looked at the existing
> iSCSILogicalUnit and iSCSITarget resource scripts, and they didn't seem
> to do quite what we wanted so we have started down the route of writing
> our own. Our new scripts are GPL and their current incarnations are
> available at https://github.com/tigercomputing/ocf-lio
> 
> In general terms the setup is reasonably simple: we have a DRBD volume
> running in dual-primary mode, which is then used to create an iblock
> device, which itself is exported over iSCSI. We have been attempting to
> use ALUA multipathing in implicit mode only to manage target failover.
> 
> We create two ALUA TPGs on each node, call them east/west, and mark one
> as Active/Optimised and the other as Active/NonOptimised. When we create
> the iSCSI TPGs on both nodes, one node's TPG is placed in the west ALUA
> TPG and the other node's is placed into the east ALUA TPG.
> 
> When simulating a fail-over, the ALUA states on both east and west are
> changed on both nodes and kept synchronised.
> 
> What we see when using multipathd on Linux as the initiator all appears
> to work well until we switch roles on the target. multipathd seems to
> stick to the old path, even though it is now NonOptimised and running
> slowly due to the 100ms nonop_delay_msecs.
> 
> If, instead, we set the standby path to Standby mode rather than
> Active/NonOptimised, multipathd correctly notices the path is
> unavailable and sends IO over the Active/Optimised path. However, if the
> initiator originally logs-in to the target while the path is in Standby
> mode, it fails to correctly probe the device. When it becomes
> Active/Optimised during failover, multipathd is unable to use it and
> fails the path. The TUR checker returns that the path is active, though,
> and makes the path active again, only to be failed again etc... The only
> way to bring it back to life is to "echo 1 >
> /sys/block/$DEV/device/rescan" and re-run multipath by hand.
> 
> I haven't been able to test this myself, but Philip (CCed) reports that
> similar behaviour is seen using VMware as the initiator rather than Linux.
> 
> Has anyone managed to set up an ALUA multipath HA SAN with two nodes and
> LIO? What are we missing? Am I going to have to throw in the towel on
> ALUA and just use virtual IP failover instead?

After testing this evening with similar config on a single target
instance, the issue where initial LUN probe failures occur on a ALUA
group set implicitly to Standby state is reproducible..

The failure occurs during the initial READ_CAPACITY, which is currently
disallowed in opcode checking within core_alua_state_standby() code.  I
thought at one point READ_CAPACITY could fail during initial LUN probe
and still bring up a struct scsi_device with a zero number of sectors,
but could be wrong..? (Hannes CC'ed)

In any event, the following patch to permit READ_CAPACITY addresses the
initial LUN probe failure and works on my end, and should allow implicit
ALUA Active/* <-> Standby + vice versa state change to function now.

Please confirm with your setup.

Thanks!

--nab

diff --git a/drivers/target/target_core_alua.c b/drivers/target/target_core_alua.c
index fcbe612..63512cc 100644
--- a/drivers/target/target_core_alua.c
+++ b/drivers/target/target_core_alua.c
@@ -576,7 +576,16 @@ static inline int core_alua_state_standby(
 	case REPORT_LUNS:
 	case RECEIVE_DIAGNOSTIC:
 	case SEND_DIAGNOSTIC:
+	case READ_CAPACITY:
 		return 0;
+	case SERVICE_ACTION_IN:
+		switch (cdb[1] & 0x1f) {
+		case SAI_READ_CAPACITY_16:
+			return 0;
+		default:
+			set_ascq(cmd, ASCQ_04H_ALUA_TG_PT_STANDBY);
+			return 1;
+		}
 	case MAINTENANCE_IN:
 		switch (cdb[1] & 0x1f) {
 		case MI_REPORT_TARGET_PGS:

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html