Re: Non persistent SCSI serial (word 83)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2015-08-22 at 01:33 +0300, Etzion Bar-Noy wrote:
> Hi. I have been looking for a solution for a while now, and found
> none, so this post here is my last attempt to solve the LIO issue i am
> encountering, before I give up and move to some other solution...
> Description:
> OS: Centos 7.1, latest updates (correct for Aug, 2015).
> Targetcli version: rpm -qa | grep targetcli
> targetcli-2.1.fb37-3.el7.noarch
> Kernel: uname -a
> Linux controller1 3.10.0-229.11.1.el7.x86_64 #1 SMP Thu Aug 6 01:06:18
> UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> 
> If there's anything missing, let me know.
> 
> Problem summary: In a PCS-based HA cluster, when failing over the LUN,
> the lun serial changes, and this causes multipath clients to misbehave
> (especially after an iSCSI client reboot).
> Some more of the setup: the setup makes use of two nodes with
> PCS-based cluster. The cluster setup was a modified follow up of this
> site: https://bm-stor.com/index.php/blog/Linux-cluster-with-ZFS-on-Cluster-in-a-Box/
> , except that I use multipathing and not network teaming.
> iSCSI layout:
> targetcli ls
> o- / .........................................................................................................................
> [...]
>   o- backstores
> ..............................................................................................................
> [...]
>   | o- block ..................................................................................................
> [Storage Objects: 2]
>   | | o- lun2-tier2
> ............................................................
> [/dev/mapper/T2-lun2 (6.6TiB) write-thru activated]
>   | | o- lun3-tier3
> .........................................................
> [/dev/mapper/T3-lun1 (1024.0GiB) write-thru activated]
>   | o- fileio .................................................................................................
> [Storage Objects: 0]
>   | o- pscsi ..................................................................................................
> [Storage Objects: 0]
>   | o- ramdisk ................................................................................................
> [Storage Objects: 0]
>   o- iscsi ............................................................................................................
> [Targets: 2]
>   | o- iqn.2005-05.com.poliva:cib.tier2
> ..................................................................................
> [TPGs: 1]
>   | | o- tpg1 ..................................................................................................
> [gen-acls, no-auth]
>   | |   o- acls
> ..........................................................................................................
> [ACLs: 0]
>   | |   o- luns
> ..........................................................................................................
> [LUNs: 1]
>   | |   | o- lun2
> .........................................................................
> [block/lun2-tier2 (/dev/mapper/T2-lun2)]
>   | |   o- portals
> ....................................................................................................
> [Portals: 2]
>   | |     o- 10.254.254.4:3260
> ................................................................................................
> [OK]
>   | |     o- 10.254.255.4:3260
> ................................................................................................
> [OK]
>   | o- iqn.2005-05.com.poliva:cib.tier3
> ..................................................................................
> [TPGs: 1]
>   |   o- tpg1 ..................................................................................................
> [gen-acls, no-auth]
>   |     o- acls
> ..........................................................................................................
> [ACLs: 0]
>   |     o- luns
> ..........................................................................................................
> [LUNs: 1]
>   |     | o- lun3
> .........................................................................
> [block/lun3-tier3 (/dev/mapper/T3-lun1)]
>   |     o- portals
> ....................................................................................................
> [Portals: 2]
>   |       o- 10.254.254.5:3260
> ................................................................................................
> [OK]
>   |       o- 10.254.255.5:3260
> ................................................................................................
> [OK]
>   o- loopback .........................................................................................................
> [Targets: 0]
> 
> I do not use (unless required to) ACLs for the time being.
> After a LUN takeover/takeback (aka - relocation to another host), the
> IP address is backup  (within about 10-20 seconds), the iSCSI target
> is up and available, and all cluster resources show as healthy.,
> however, the client host, especially if rebooted, will not see the
> same identifier for this LUN (serial, word 83, name it as you like).
> It is 100% reproducible if two conditions happen:
> 1. The LUN is migrated to the other node
> 2. The client machine is rebooted (order is optional).
> # multipath -ll
> mpathc (36001405e3de7a9800000000000000000) dm-2 LIO-ORG,lun3-tier3
> size=1024G features='1 queue_if_no_path' hwhandler='0' wp=rw
> `-+- policy='round-robin 0' prio=1 status=active
>   |- 0:0:0:3 sdb  8:16  active ready running
>   `- 1:0:0:3 sda  8:0   active ready running
> [root@temp-iSCSI ~]# multipath -F
> [root@temp-iSCSI ~]# iscsiadm -m node -U all
> Logging out of session [sid: 1, target:
> iqn.2005-05.com.poliva:cib.tier3, portal: 10.254.255.5,3260]
> Logging out of session [sid: 2, target:
> iqn.2005-05.com.poliva:cib.tier3, portal: 10.254.254.5,3260]
> Logout of [sid: 1, target: iqn.2005-05.com.poliva:cib.tier3, portal:
> 10.254.255.5,3260] successful.
> Logout of [sid: 2, target: iqn.2005-05.com.poliva:cib.tier3, portal:
> 10.254.254.5,3260] successful.
> [root@temp-iSCSI ~]# iscsiadm -m node -L all
> Logging in to [iface: default, target:
> iqn.2005-05.com.poliva:cib.tier3, portal: 10.254.255.5,3260]
> (multiple)
> Logging in to [iface: default, target:
> iqn.2005-05.com.poliva:cib.tier3, portal: 10.254.254.5,3260]
> (multiple)
> Login to [iface: default, target: iqn.2005-05.com.poliva:cib.tier3,
> portal: 10.254.255.5,3260] successful.
> Login to [iface: default, target: iqn.2005-05.com.poliva:cib.tier3,
> portal: 10.254.254.5,3260] successful.
> [root@temp-iSCSI ~]# multipath
> [root@temp-iSCSI ~]# multipath -ll
> mpathd (3600140599e044f8681345d3aa4824abc) dm-2 LIO-ORG,lun3-tier3
> size=1024G features='1 queue_if_no_path' hwhandler='0' wp=rw
> `-+- policy='round-robin 0' prio=1 status=active
>   |- 2:0:0:3 sdb  8:16  active ready running
>   `- 3:0:0:3 sda  8:0   active ready running
> 
> 
> I am not using ACLs for the time being. I will integrate ACLs later on.
> 
> Thanks for any insight, or even a simple tip on how I can maintain
> dedicated HA solution.

The backend device UUID (and EVPD=0x83 that uses it) is set in

   /sys/kernel/config/target/core/$HBA/$DEV/wwn/vpd_unit_serial

>From the looks of it, your H/A scripts are resetting it to something new
each time export fail-over occurs.

You'll need to make sure it's using the same value on both nodes, to
ensure a consistent view to active initiators.

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux