Thank you for your prompt response. You pointed at my problem. Just to conclude - there is a bug in Redhat's iSCSILogicalUnit script agent for 'PCS' HA cluster (the latest generation of 'Redhat Cluster Suite') which incorrectly sets the SN. Now, that I have worked around the bug, I will see whom I need to notify about my solution, so Redhat merges it into their suite. Thanks! Etzion On 23 August 2015 at 01:00, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote: > On Sat, 2015-08-22 at 01:33 +0300, Etzion Bar-Noy wrote: >> Hi. I have been looking for a solution for a while now, and found >> none, so this post here is my last attempt to solve the LIO issue i am >> encountering, before I give up and move to some other solution... >> Description: >> OS: Centos 7.1, latest updates (correct for Aug, 2015). >> Targetcli version: rpm -qa | grep targetcli >> targetcli-2.1.fb37-3.el7.noarch >> Kernel: uname -a >> Linux controller1 3.10.0-229.11.1.el7.x86_64 #1 SMP Thu Aug 6 01:06:18 >> UTC 2015 x86_64 x86_64 x86_64 GNU/Linux >> >> If there's anything missing, let me know. >> >> Problem summary: In a PCS-based HA cluster, when failing over the LUN, >> the lun serial changes, and this causes multipath clients to misbehave >> (especially after an iSCSI client reboot). >> Some more of the setup: the setup makes use of two nodes with >> PCS-based cluster. The cluster setup was a modified follow up of this >> site: https://bm-stor.com/index.php/blog/Linux-cluster-with-ZFS-on-Cluster-in-a-Box/ >> , except that I use multipathing and not network teaming. >> iSCSI layout: >> targetcli ls >> o- / ......................................................................................................................... >> [...] >> o- backstores >> .............................................................................................................. >> [...] >> | o- block .................................................................................................. >> [Storage Objects: 2] >> | | o- lun2-tier2 >> ............................................................ >> [/dev/mapper/T2-lun2 (6.6TiB) write-thru activated] >> | | o- lun3-tier3 >> ......................................................... >> [/dev/mapper/T3-lun1 (1024.0GiB) write-thru activated] >> | o- fileio ................................................................................................. >> [Storage Objects: 0] >> | o- pscsi .................................................................................................. >> [Storage Objects: 0] >> | o- ramdisk ................................................................................................ >> [Storage Objects: 0] >> o- iscsi ............................................................................................................ >> [Targets: 2] >> | o- iqn.2005-05.com.poliva:cib.tier2 >> .................................................................................. >> [TPGs: 1] >> | | o- tpg1 .................................................................................................. >> [gen-acls, no-auth] >> | | o- acls >> .......................................................................................................... >> [ACLs: 0] >> | | o- luns >> .......................................................................................................... >> [LUNs: 1] >> | | | o- lun2 >> ......................................................................... >> [block/lun2-tier2 (/dev/mapper/T2-lun2)] >> | | o- portals >> .................................................................................................... >> [Portals: 2] >> | | o- 10.254.254.4:3260 >> ................................................................................................ >> [OK] >> | | o- 10.254.255.4:3260 >> ................................................................................................ >> [OK] >> | o- iqn.2005-05.com.poliva:cib.tier3 >> .................................................................................. >> [TPGs: 1] >> | o- tpg1 .................................................................................................. >> [gen-acls, no-auth] >> | o- acls >> .......................................................................................................... >> [ACLs: 0] >> | o- luns >> .......................................................................................................... >> [LUNs: 1] >> | | o- lun3 >> ......................................................................... >> [block/lun3-tier3 (/dev/mapper/T3-lun1)] >> | o- portals >> .................................................................................................... >> [Portals: 2] >> | o- 10.254.254.5:3260 >> ................................................................................................ >> [OK] >> | o- 10.254.255.5:3260 >> ................................................................................................ >> [OK] >> o- loopback ......................................................................................................... >> [Targets: 0] >> >> I do not use (unless required to) ACLs for the time being. >> After a LUN takeover/takeback (aka - relocation to another host), the >> IP address is backup (within about 10-20 seconds), the iSCSI target >> is up and available, and all cluster resources show as healthy., >> however, the client host, especially if rebooted, will not see the >> same identifier for this LUN (serial, word 83, name it as you like). >> It is 100% reproducible if two conditions happen: >> 1. The LUN is migrated to the other node >> 2. The client machine is rebooted (order is optional). >> # multipath -ll >> mpathc (36001405e3de7a9800000000000000000) dm-2 LIO-ORG,lun3-tier3 >> size=1024G features='1 queue_if_no_path' hwhandler='0' wp=rw >> `-+- policy='round-robin 0' prio=1 status=active >> |- 0:0:0:3 sdb 8:16 active ready running >> `- 1:0:0:3 sda 8:0 active ready running >> [root@temp-iSCSI ~]# multipath -F >> [root@temp-iSCSI ~]# iscsiadm -m node -U all >> Logging out of session [sid: 1, target: >> iqn.2005-05.com.poliva:cib.tier3, portal: 10.254.255.5,3260] >> Logging out of session [sid: 2, target: >> iqn.2005-05.com.poliva:cib.tier3, portal: 10.254.254.5,3260] >> Logout of [sid: 1, target: iqn.2005-05.com.poliva:cib.tier3, portal: >> 10.254.255.5,3260] successful. >> Logout of [sid: 2, target: iqn.2005-05.com.poliva:cib.tier3, portal: >> 10.254.254.5,3260] successful. >> [root@temp-iSCSI ~]# iscsiadm -m node -L all >> Logging in to [iface: default, target: >> iqn.2005-05.com.poliva:cib.tier3, portal: 10.254.255.5,3260] >> (multiple) >> Logging in to [iface: default, target: >> iqn.2005-05.com.poliva:cib.tier3, portal: 10.254.254.5,3260] >> (multiple) >> Login to [iface: default, target: iqn.2005-05.com.poliva:cib.tier3, >> portal: 10.254.255.5,3260] successful. >> Login to [iface: default, target: iqn.2005-05.com.poliva:cib.tier3, >> portal: 10.254.254.5,3260] successful. >> [root@temp-iSCSI ~]# multipath >> [root@temp-iSCSI ~]# multipath -ll >> mpathd (3600140599e044f8681345d3aa4824abc) dm-2 LIO-ORG,lun3-tier3 >> size=1024G features='1 queue_if_no_path' hwhandler='0' wp=rw >> `-+- policy='round-robin 0' prio=1 status=active >> |- 2:0:0:3 sdb 8:16 active ready running >> `- 3:0:0:3 sda 8:0 active ready running >> >> >> I am not using ACLs for the time being. I will integrate ACLs later on. >> >> Thanks for any insight, or even a simple tip on how I can maintain >> dedicated HA solution. > > The backend device UUID (and EVPD=0x83 that uses it) is set in > > /sys/kernel/config/target/core/$HBA/$DEV/wwn/vpd_unit_serial > > From the looks of it, your H/A scripts are resetting it to something new > each time export fail-over occurs. > > You'll need to make sure it's using the same value on both nodes, to > ensure a consistent view to active initiators. > > --nab > -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html