On Sat, 2015-08-22 at 01:33 +0300, Etzion Bar-Noy wrote: > Hi. I have been looking for a solution for a while now, and found > none, so this post here is my last attempt to solve the LIO issue i am > encountering, before I give up and move to some other solution... > Description: > OS: Centos 7.1, latest updates (correct for Aug, 2015). > Targetcli version: rpm -qa | grep targetcli > targetcli-2.1.fb37-3.el7.noarch > Kernel: uname -a > Linux controller1 3.10.0-229.11.1.el7.x86_64 #1 SMP Thu Aug 6 01:06:18 > UTC 2015 x86_64 x86_64 x86_64 GNU/Linux > > If there's anything missing, let me know. > > Problem summary: In a PCS-based HA cluster, when failing over the LUN, > the lun serial changes, and this causes multipath clients to misbehave > (especially after an iSCSI client reboot). > Some more of the setup: the setup makes use of two nodes with > PCS-based cluster. The cluster setup was a modified follow up of this > site: https://bm-stor.com/index.php/blog/Linux-cluster-with-ZFS-on-Cluster-in-a-Box/ > , except that I use multipathing and not network teaming. > iSCSI layout: > targetcli ls > o- / ......................................................................................................................... > [...] > o- backstores > .............................................................................................................. > [...] > | o- block .................................................................................................. > [Storage Objects: 2] > | | o- lun2-tier2 > ............................................................ > [/dev/mapper/T2-lun2 (6.6TiB) write-thru activated] > | | o- lun3-tier3 > ......................................................... > [/dev/mapper/T3-lun1 (1024.0GiB) write-thru activated] > | o- fileio ................................................................................................. > [Storage Objects: 0] > | o- pscsi .................................................................................................. > [Storage Objects: 0] > | o- ramdisk ................................................................................................ > [Storage Objects: 0] > o- iscsi ............................................................................................................ > [Targets: 2] > | o- iqn.2005-05.com.poliva:cib.tier2 > .................................................................................. > [TPGs: 1] > | | o- tpg1 .................................................................................................. > [gen-acls, no-auth] > | | o- acls > .......................................................................................................... > [ACLs: 0] > | | o- luns > .......................................................................................................... > [LUNs: 1] > | | | o- lun2 > ......................................................................... > [block/lun2-tier2 (/dev/mapper/T2-lun2)] > | | o- portals > .................................................................................................... > [Portals: 2] > | | o- 10.254.254.4:3260 > ................................................................................................ > [OK] > | | o- 10.254.255.4:3260 > ................................................................................................ > [OK] > | o- iqn.2005-05.com.poliva:cib.tier3 > .................................................................................. > [TPGs: 1] > | o- tpg1 .................................................................................................. > [gen-acls, no-auth] > | o- acls > .......................................................................................................... > [ACLs: 0] > | o- luns > .......................................................................................................... > [LUNs: 1] > | | o- lun3 > ......................................................................... > [block/lun3-tier3 (/dev/mapper/T3-lun1)] > | o- portals > .................................................................................................... > [Portals: 2] > | o- 10.254.254.5:3260 > ................................................................................................ > [OK] > | o- 10.254.255.5:3260 > ................................................................................................ > [OK] > o- loopback ......................................................................................................... > [Targets: 0] > > I do not use (unless required to) ACLs for the time being. > After a LUN takeover/takeback (aka - relocation to another host), the > IP address is backup (within about 10-20 seconds), the iSCSI target > is up and available, and all cluster resources show as healthy., > however, the client host, especially if rebooted, will not see the > same identifier for this LUN (serial, word 83, name it as you like). > It is 100% reproducible if two conditions happen: > 1. The LUN is migrated to the other node > 2. The client machine is rebooted (order is optional). > # multipath -ll > mpathc (36001405e3de7a9800000000000000000) dm-2 LIO-ORG,lun3-tier3 > size=1024G features='1 queue_if_no_path' hwhandler='0' wp=rw > `-+- policy='round-robin 0' prio=1 status=active > |- 0:0:0:3 sdb 8:16 active ready running > `- 1:0:0:3 sda 8:0 active ready running > [root@temp-iSCSI ~]# multipath -F > [root@temp-iSCSI ~]# iscsiadm -m node -U all > Logging out of session [sid: 1, target: > iqn.2005-05.com.poliva:cib.tier3, portal: 10.254.255.5,3260] > Logging out of session [sid: 2, target: > iqn.2005-05.com.poliva:cib.tier3, portal: 10.254.254.5,3260] > Logout of [sid: 1, target: iqn.2005-05.com.poliva:cib.tier3, portal: > 10.254.255.5,3260] successful. > Logout of [sid: 2, target: iqn.2005-05.com.poliva:cib.tier3, portal: > 10.254.254.5,3260] successful. > [root@temp-iSCSI ~]# iscsiadm -m node -L all > Logging in to [iface: default, target: > iqn.2005-05.com.poliva:cib.tier3, portal: 10.254.255.5,3260] > (multiple) > Logging in to [iface: default, target: > iqn.2005-05.com.poliva:cib.tier3, portal: 10.254.254.5,3260] > (multiple) > Login to [iface: default, target: iqn.2005-05.com.poliva:cib.tier3, > portal: 10.254.255.5,3260] successful. > Login to [iface: default, target: iqn.2005-05.com.poliva:cib.tier3, > portal: 10.254.254.5,3260] successful. > [root@temp-iSCSI ~]# multipath > [root@temp-iSCSI ~]# multipath -ll > mpathd (3600140599e044f8681345d3aa4824abc) dm-2 LIO-ORG,lun3-tier3 > size=1024G features='1 queue_if_no_path' hwhandler='0' wp=rw > `-+- policy='round-robin 0' prio=1 status=active > |- 2:0:0:3 sdb 8:16 active ready running > `- 3:0:0:3 sda 8:0 active ready running > > > I am not using ACLs for the time being. I will integrate ACLs later on. > > Thanks for any insight, or even a simple tip on how I can maintain > dedicated HA solution. The backend device UUID (and EVPD=0x83 that uses it) is set in /sys/kernel/config/target/core/$HBA/$DEV/wwn/vpd_unit_serial >From the looks of it, your H/A scripts are resetting it to something new each time export fail-over occurs. You'll need to make sure it's using the same value on both nodes, to ensure a consistent view to active initiators. --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html