Hi Chris & Phillip, On Thu, 2014-06-05 at 15:31 +0100, Chris Boot wrote: > Hi folks, > > Background: I'm working on creating a Highly Available iSCSI system > using Pacemaker with some collaborators. We looked at the existing > iSCSILogicalUnit and iSCSITarget resource scripts, and they didn't seem > to do quite what we wanted so we have started down the route of writing > our own. Our new scripts are GPL and their current incarnations are > available at https://github.com/tigercomputing/ocf-lio > > In general terms the setup is reasonably simple: we have a DRBD volume > running in dual-primary mode, which is then used to create an iblock > device, which itself is exported over iSCSI. We have been attempting to > use ALUA multipathing in implicit mode only to manage target failover. > > We create two ALUA TPGs on each node, call them east/west, and mark one > as Active/Optimised and the other as Active/NonOptimised. When we create > the iSCSI TPGs on both nodes, one node's TPG is placed in the west ALUA > TPG and the other node's is placed into the east ALUA TPG. > > When simulating a fail-over, the ALUA states on both east and west are > changed on both nodes and kept synchronised. > > What we see when using multipathd on Linux as the initiator all appears > to work well until we switch roles on the target. multipathd seems to > stick to the old path, even though it is now NonOptimised and running > slowly due to the 100ms nonop_delay_msecs. > > If, instead, we set the standby path to Standby mode rather than > Active/NonOptimised, multipathd correctly notices the path is > unavailable and sends IO over the Active/Optimised path. However, if the > initiator originally logs-in to the target while the path is in Standby > mode, it fails to correctly probe the device. When it becomes > Active/Optimised during failover, multipathd is unable to use it and > fails the path. The TUR checker returns that the path is active, though, > and makes the path active again, only to be failed again etc... The only > way to bring it back to life is to "echo 1 > > /sys/block/$DEV/device/rescan" and re-run multipath by hand. > > I haven't been able to test this myself, but Philip (CCed) reports that > similar behaviour is seen using VMware as the initiator rather than Linux. > > Has anyone managed to set up an ALUA multipath HA SAN with two nodes and > LIO? What are we missing? Am I going to have to throw in the towel on > ALUA and just use virtual IP failover instead? After testing this evening with similar config on a single target instance, the issue where initial LUN probe failures occur on a ALUA group set implicitly to Standby state is reproducible.. The failure occurs during the initial READ_CAPACITY, which is currently disallowed in opcode checking within core_alua_state_standby() code. I thought at one point READ_CAPACITY could fail during initial LUN probe and still bring up a struct scsi_device with a zero number of sectors, but could be wrong..? (Hannes CC'ed) In any event, the following patch to permit READ_CAPACITY addresses the initial LUN probe failure and works on my end, and should allow implicit ALUA Active/* <-> Standby + vice versa state change to function now. Please confirm with your setup. Thanks! --nab diff --git a/drivers/target/target_core_alua.c b/drivers/target/target_core_alua.c index fcbe612..63512cc 100644 --- a/drivers/target/target_core_alua.c +++ b/drivers/target/target_core_alua.c @@ -576,7 +576,16 @@ static inline int core_alua_state_standby( case REPORT_LUNS: case RECEIVE_DIAGNOSTIC: case SEND_DIAGNOSTIC: + case READ_CAPACITY: return 0; + case SERVICE_ACTION_IN: + switch (cdb[1] & 0x1f) { + case SAI_READ_CAPACITY_16: + return 0; + default: + set_ascq(cmd, ASCQ_04H_ALUA_TG_PT_STANDBY); + return 1; + } case MAINTENANCE_IN: switch (cdb[1] & 0x1f) { case MI_REPORT_TARGET_PGS: -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html