Dual target node ALUA multipathing for Vmware

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Good afternoon,

We are testing a dual node Pacemaker based cluster to deliver Fibre
Channel LUNs to Vmware (ESXi 5.5).  OS is Ubuntu Trusty 14.04, LIO
version is:

Target Engine Core ConfigFS Infrastructure v4.1.0 on Linux/x86_64 on
3.14.8-031408-generic

Each of the two LIO nodes (e1, kio1) has a QLA2462 Fibre Channel
(dual-port) HBAs, and the initiator node also has a QLA2462.   These
are connected via a Brocade 4Gb Silkworm switch.

We are setting TPG ID to be separate on each node, while keeping the
LUN serial number the same, back end storage being a Ceph image
presented equally to both LIO nodes.

One node sets ALUA state to s (Standby), while the other to o
(Active/Optimized).  However,when both nodes are presenting the same
LUN to Vmware, the TPG is only set to one of the nodes' ID for all
paths to storage.  ALUA state is Standby for all LUNs and there are
many errors in Vmware host's vmkernel.log, such as:

2014-09-06T03:38:58.748Z cpu2:45460)WARNING: NMP:
nmpDeviceAttemptFailover:603: Retry world failover device
"naa.6001405afcc298020000000000000000" - issuing command
0x41364041a100
2014-09-06T03:38:58.748Z cpu2:45460)WARNING: NMP:
nmp_SelectPathAndIssueCommand:3174: PSP selected path
"vmhba2:C0:T0:L1" in a bad state (standby) on device
"naa.6001405afcc298020000000000000000".
2014-09-06T03:38:58.748Z cpu2:45460)WARNING: NMP:
nmpCompleteRetryForPath:352: Retry cmd 0x1a (0x41364041a100) to dev
"naa.6001405afcc298020000000000000000" failed on path
"vmhba2:C0:T0:L1" H:0x1 D:0x0 P:0x0 Possible sense data: 0x2 0x3a
2014-09-06T03:38:58.748Z cpu2:45460)WARNING: NMP:
nmpCompleteRetryForPath:382: Logical device
"naa.6001405afcc298020000000000000000": awaiting fast path state
update before retrying failed command again...
2014-09-06T03:38:59.748Z cpu17:37198)WARNING: VMW_SATP_ALUA:
satp_alua_determineStatus:563: VMW_SATP_ALUA:Unknown Check condition
0/2 0x5 0x26 0x0.
2014-09-06T03:38:59.748Z cpu17:37198)WARNING: VMW_SATP_ALUA:
satp_alua_issueCommandOnPath:685: Probe cmd 0xa4 failed for path
"vmhba2:C0:T0:L1" (0x5/0x26/0x0). Check if failover mode is still
ALUA.
2014-09-06T03:38:59.748Z cpu17:37198)WARNING: VMW_SATP_ALUA:
satp_alua_issueSTPG:490: STPG failed on path "vmhba2:C0:T0:L1"

Here is some LIO Info:

root@kio1:/sys/kernel/config/target# tcm_node --listtgptgps
iblock_4711/p_FCLun_test2
\------> kio1  Target Port Group ID: 2
         Active ALUA Access Type(s): Implicit and Explicit
         Primary Access State: Standby
         Primary Access Status: Altered by Implicit ALUA
         Preferred Bit: 0
         Active/NonOptimized Delay in milliseconds: 100
         Transition Delay in milliseconds: 0
         \------> TG Port Group Members
         qla2xxx/naa.21000024ff4f0dee/tpgt_1/lun_1
         qla2xxx/naa.21000024ff4f0def/tpgt_1/lun_1

\------> default_tg_pt_gp  Target Port Group ID: 0
         Active ALUA Access Type(s): Implicit and Explicit
         Primary Access State: Active/Optimized
         Primary Access Status: None
         Preferred Bit: 0
         Active/NonOptimized Delay in milliseconds: 100
         Transition Delay in milliseconds: 0
         \------> TG Port Group Members
             No Target Port Group Members

root@e1:/var/log# tcm_node --listtgptgps iblock_4711/p_FCLun_test
\------> e1  Target Port Group ID: 1
         Active ALUA Access Type(s): Implicit and Explicit
         Primary Access State: Active/Optimized
         Primary Access Status: None
         Preferred Bit: 0
         Active/NonOptimized Delay in milliseconds: 100
         Transition Delay in milliseconds: 0
         \------> TG Port Group Members
         qla2xxx/naa.21000024ff4f0f20/tpgt_1/lun_1
         qla2xxx/naa.21000024ff4f0f21/tpgt_1/lun_1

\------> default_tg_pt_gp  Target Port Group ID: 0
         Active ALUA Access Type(s): Implicit and Explicit
         Primary Access State: Active/Optimized
         Primary Access Status: None
         Preferred Bit: 0
         Active/NonOptimized Delay in milliseconds: 100
         Transition Delay in milliseconds: 0
         \------> TG Port Group Members
             No Target Port Group Members

Vmware does not identify both TPGs, but only one:

naa.6001405afcc298020000000000000000
   Device Display Name: LIO-ORG Fibre Channel Disk
(naa.6001405afcc298020000000000000000)
   Storage Array Type: VMW_SATP_ALUA
   Storage Array Type Device Config:
{implicit_support=on;explicit_support=on;
explicit_allow=on;alua_followover=on;{TPG_id=1,TPG_state=AO}{TPG_id=1,TPG_state=AO}{TPG_id=1,TPG_state=AO}{TPG_id=1,TPG_state=AO}}
   Path Selection Policy: VMW_PSP_MRU
   Path Selection Policy Device Config: Current Path=vmhba2:C0:T0:L1
   Path Selection Policy Device Custom Config:
   Working Paths: vmhba2:C0:T0:L1
   Is Local SAS Device: false
   Is Boot USB Device: false

So the questions are:

- Are we missing something related to setting up LIO that would make
the Vmware host recognize two separate TPGs?

- http://www.spinics.net/lists/target-devel/msg07102.html - this
appears to e needed for reprobing, but is this also required for
detection?

Thank you,
RW
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux