Good afternoon, We are testing a dual node Pacemaker based cluster to deliver Fibre Channel LUNs to Vmware (ESXi 5.5). OS is Ubuntu Trusty 14.04, LIO version is: Target Engine Core ConfigFS Infrastructure v4.1.0 on Linux/x86_64 on 3.14.8-031408-generic Each of the two LIO nodes (e1, kio1) has a QLA2462 Fibre Channel (dual-port) HBAs, and the initiator node also has a QLA2462. These are connected via a Brocade 4Gb Silkworm switch. We are setting TPG ID to be separate on each node, while keeping the LUN serial number the same, back end storage being a Ceph image presented equally to both LIO nodes. One node sets ALUA state to s (Standby), while the other to o (Active/Optimized). However,when both nodes are presenting the same LUN to Vmware, the TPG is only set to one of the nodes' ID for all paths to storage. ALUA state is Standby for all LUNs and there are many errors in Vmware host's vmkernel.log, such as: 2014-09-06T03:38:58.748Z cpu2:45460)WARNING: NMP: nmpDeviceAttemptFailover:603: Retry world failover device "naa.6001405afcc298020000000000000000" - issuing command 0x41364041a100 2014-09-06T03:38:58.748Z cpu2:45460)WARNING: NMP: nmp_SelectPathAndIssueCommand:3174: PSP selected path "vmhba2:C0:T0:L1" in a bad state (standby) on device "naa.6001405afcc298020000000000000000". 2014-09-06T03:38:58.748Z cpu2:45460)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x1a (0x41364041a100) to dev "naa.6001405afcc298020000000000000000" failed on path "vmhba2:C0:T0:L1" H:0x1 D:0x0 P:0x0 Possible sense data: 0x2 0x3a 2014-09-06T03:38:58.748Z cpu2:45460)WARNING: NMP: nmpCompleteRetryForPath:382: Logical device "naa.6001405afcc298020000000000000000": awaiting fast path state update before retrying failed command again... 2014-09-06T03:38:59.748Z cpu17:37198)WARNING: VMW_SATP_ALUA: satp_alua_determineStatus:563: VMW_SATP_ALUA:Unknown Check condition 0/2 0x5 0x26 0x0. 2014-09-06T03:38:59.748Z cpu17:37198)WARNING: VMW_SATP_ALUA: satp_alua_issueCommandOnPath:685: Probe cmd 0xa4 failed for path "vmhba2:C0:T0:L1" (0x5/0x26/0x0). Check if failover mode is still ALUA. 2014-09-06T03:38:59.748Z cpu17:37198)WARNING: VMW_SATP_ALUA: satp_alua_issueSTPG:490: STPG failed on path "vmhba2:C0:T0:L1" Here is some LIO Info: root@kio1:/sys/kernel/config/target# tcm_node --listtgptgps iblock_4711/p_FCLun_test2 \------> kio1 Target Port Group ID: 2 Active ALUA Access Type(s): Implicit and Explicit Primary Access State: Standby Primary Access Status: Altered by Implicit ALUA Preferred Bit: 0 Active/NonOptimized Delay in milliseconds: 100 Transition Delay in milliseconds: 0 \------> TG Port Group Members qla2xxx/naa.21000024ff4f0dee/tpgt_1/lun_1 qla2xxx/naa.21000024ff4f0def/tpgt_1/lun_1 \------> default_tg_pt_gp Target Port Group ID: 0 Active ALUA Access Type(s): Implicit and Explicit Primary Access State: Active/Optimized Primary Access Status: None Preferred Bit: 0 Active/NonOptimized Delay in milliseconds: 100 Transition Delay in milliseconds: 0 \------> TG Port Group Members No Target Port Group Members root@e1:/var/log# tcm_node --listtgptgps iblock_4711/p_FCLun_test \------> e1 Target Port Group ID: 1 Active ALUA Access Type(s): Implicit and Explicit Primary Access State: Active/Optimized Primary Access Status: None Preferred Bit: 0 Active/NonOptimized Delay in milliseconds: 100 Transition Delay in milliseconds: 0 \------> TG Port Group Members qla2xxx/naa.21000024ff4f0f20/tpgt_1/lun_1 qla2xxx/naa.21000024ff4f0f21/tpgt_1/lun_1 \------> default_tg_pt_gp Target Port Group ID: 0 Active ALUA Access Type(s): Implicit and Explicit Primary Access State: Active/Optimized Primary Access Status: None Preferred Bit: 0 Active/NonOptimized Delay in milliseconds: 100 Transition Delay in milliseconds: 0 \------> TG Port Group Members No Target Port Group Members Vmware does not identify both TPGs, but only one: naa.6001405afcc298020000000000000000 Device Display Name: LIO-ORG Fibre Channel Disk (naa.6001405afcc298020000000000000000) Storage Array Type: VMW_SATP_ALUA Storage Array Type Device Config: {implicit_support=on;explicit_support=on; explicit_allow=on;alua_followover=on;{TPG_id=1,TPG_state=AO}{TPG_id=1,TPG_state=AO}{TPG_id=1,TPG_state=AO}{TPG_id=1,TPG_state=AO}} Path Selection Policy: VMW_PSP_MRU Path Selection Policy Device Config: Current Path=vmhba2:C0:T0:L1 Path Selection Policy Device Custom Config: Working Paths: vmhba2:C0:T0:L1 Is Local SAS Device: false Is Boot USB Device: false So the questions are: - Are we missing something related to setting up LIO that would make the Vmware host recognize two separate TPGs? - http://www.spinics.net/lists/target-devel/msg07102.html - this appears to e needed for reprobing, but is this also required for detection? Thank you, RW -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html