Hi folks, Background: I'm working on creating a Highly Available iSCSI system using Pacemaker with some collaborators. We looked at the existing iSCSILogicalUnit and iSCSITarget resource scripts, and they didn't seem to do quite what we wanted so we have started down the route of writing our own. Our new scripts are GPL and their current incarnations are available at https://github.com/tigercomputing/ocf-lio In general terms the setup is reasonably simple: we have a DRBD volume running in dual-primary mode, which is then used to create an iblock device, which itself is exported over iSCSI. We have been attempting to use ALUA multipathing in implicit mode only to manage target failover. We create two ALUA TPGs on each node, call them east/west, and mark one as Active/Optimised and the other as Active/NonOptimised. When we create the iSCSI TPGs on both nodes, one node's TPG is placed in the west ALUA TPG and the other node's is placed into the east ALUA TPG. When simulating a fail-over, the ALUA states on both east and west are changed on both nodes and kept synchronised. What we see when using multipathd on Linux as the initiator all appears to work well until we switch roles on the target. multipathd seems to stick to the old path, even though it is now NonOptimised and running slowly due to the 100ms nonop_delay_msecs. If, instead, we set the standby path to Standby mode rather than Active/NonOptimised, multipathd correctly notices the path is unavailable and sends IO over the Active/Optimised path. However, if the initiator originally logs-in to the target while the path is in Standby mode, it fails to correctly probe the device. When it becomes Active/Optimised during failover, multipathd is unable to use it and fails the path. The TUR checker returns that the path is active, though, and makes the path active again, only to be failed again etc... The only way to bring it back to life is to "echo 1 > /sys/block/$DEV/device/rescan" and re-run multipath by hand. I haven't been able to test this myself, but Philip (CCed) reports that similar behaviour is seen using VMware as the initiator rather than Linux. Has anyone managed to set up an ALUA multipath HA SAN with two nodes and LIO? What are we missing? Am I going to have to throw in the towel on ALUA and just use virtual IP failover instead? We'd really appreciate some input on this. To set up the target on *both* nodes: tcm_node --establishdev iblock_0/drbd1 /dev/drbd1 tcm_node --setunitserialwithmd iblock_0/drbd1 f88e7c31-77cb-46fd-90bb-dfc8a701406e tcm_node --addaluatpgwithmd iblock_0/drbd1 lio_alua_west 100 tcm_node --addaluatpgwithmd iblock_0/drbd1 lio_alua_east 101 tcm_node --setaluatype=iblock_0/drbd1 lio_alua_west implict tcm_node --setaluatype=iblock_0/drbd1 lio_alua_east implict tcm_node --clearaluapref=iblock_0/drbd1 lio_alua_west tcm_node --clearaluapref=iblock_0/drbd1 lio_alua_east echo 100 > /sys/kernel/config/target/core/iblock_0/drbd1/alua/lio_alua_west/nonop_delay_msecs echo 0 > /sys/kernel/config/target/core/iblock_0/drbd1/alua/lio_alua_west/trans_delay_msecs echo 100 > /sys/kernel/config/target/core/iblock_0/drbd1/alua/lio_alua_east/nonop_delay_msecs echo 0 > /sys/kernel/config/target/core/iblock_0/drbd1/alua/lio_alua_east/trans_delay_msecs tcm_node --setaluastate=iblock_0/drbd1 lio_alua_west a tcm_node --setaluastate=iblock_0/drbd1 lio_alua_east o tcm_node --setaluapref=iblock_0/drbd1 lio_alua_east lio_node --addnp iqn.2014-04.com.example:drbd1 1 0.0.0.0:3260 lio_node --addnodeacl iqn.2014-04.com.example:drbd1 1 iqn.1993-08.org.debian:01:feb992244813 lio_node --addlun iqn.2014-04.com.example:drbd1 1 0 lun0 iblock_0/drbd1 lio_node --addlunacl=iqn.2014-04.com.example:drbd1 1 iqn.2014-04.com.example 0 0 lio_node --enabletpg=iqn.2014-04.com.example:drbd1 1 On the west node only: lio_node --setaluatpg=iqn.2014-04.com.example:drbd1 1 0 lio_alua_west On the east node only: lio_node --setaluatpg=iqn.2014-04.com.example:drbd1 1 0 lio_alua_east To change from east to west, on *both* nodes: tcm_node --clearaluapref=iblock_0/drbd1 lio_alua_east tcm_node --setaluastate=iblock_0/drbd1 lio_alua_east a tcm_node --setaluastate=iblock_0/drbd1 lio_alua_west o tcm_node --setaluapref=iblock_0/drbd1 lio_alua_west Our multipath.conf looks like: devices { device { vendor "LIO-ORG" path_grouping_policy group_by_prio path_checker tur prio alua hardware_handler "1 alua" failback immediate rr_weight uniform no_path_retry 12 rr_min_io 100 } } A sample 'multipath -l' output looks like: test1 (36001405571059945e344331baecb97b1) dm-3 LIO-ORG,test1 size=1024G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 63:0:0:0 sde 8:64 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 61:0:0:0 sdc 8:32 active undef running And 'multipath -ll': test1 (36001405571059945e344331baecb97b1) dm-3 LIO-ORG,test1 size=1024G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | `- 63:0:0:0 sde 8:64 active ready running `-+- policy='service-time 0' prio=10 status=enabled `- 61:0:0:0 sdc 8:32 active ready running Thanks, Chris -- Chris Boot Tiger Computing Ltd "Linux for Business" Tel: 01600 483 484 Web: http://www.tiger-computing.co.uk Follow us on Facebook: http://www.facebook.com/TigerComputing Registered in England. Company number: 3389961 Registered address: Wyastone Business Park, Wyastone Leys, Monmouth, NP25 3SR -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html