Hi Em dom., 9 de fev. de 2020 às 18:27, Mike Christie <mchristi@xxxxxxxxxx> escreveu: > On 02/08/2020 11:34 PM, Gesiel Galvão Bernardes wrote: > > Hi, > > > > Em qui., 6 de fev. de 2020 às 18:56, Mike Christie <mchristi@xxxxxxxxxx > > <mailto:mchristi@xxxxxxxxxx>> escreveu: > > > > On 02/05/2020 07:03 AM, Gesiel Galvão Bernardes wrote: > > > Em dom., 2 de fev. de 2020 às 00:37, Gesiel Galvão Bernardes > > > <gesiel.bernardes@xxxxxxxxx <mailto:gesiel.bernardes@xxxxxxxxx> > > <mailto:gesiel.bernardes@xxxxxxxxx > > <mailto:gesiel.bernardes@xxxxxxxxx>>> escreveu: > > > > > > Hi, > > > > > > Just now was possible continue this. Below is the information > > > required. Thanks advan > > > > > > Hey, sorry for the late reply. I just back from PTO. > > > > > > > > esxcli storage nmp device list -d > > naa.6001405ba48e0b99e4c418ca13506c8e > > > naa.6001405ba48e0b99e4c418ca13506c8e > > > Device Display Name: LIO-ORG iSCSI Disk > > > (naa.6001405ba48e0b99e4c418ca13506c8e) > > > Storage Array Type: VMW_SATP_ALUA > > > Storage Array Type Device Config: {implicit_support=on; > > > explicit_support=off; explicit_allow=on; alua_followover=on; > > > action_OnRetryErrors=on; {TPG_id=1,TPG_state=ANO}} > > > Path Selection Policy: VMW_PSP_MRU > > > Path Selection Policy Device Config: Current > > Path=vmhba68:C0:T0:L0 > > > Path Selection Policy Device Custom Config: > > > Working Paths: vmhba68:C0:T0:L0 > > > Is USB: false > > > > ........ > > > > > Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x4 0xa. > > Act:FAILOVER > > > > > > Are you sure you are using tcmu-runner 1.4? Is that the actual daemon > > reversion running? Did you by any chance install the 1.4 rpm, but > you/it > > did not restart the daemon? The error code above is returned in 1.3 > and > > earlier. > > > > You are probably hitting a combo of 2 issues. > > > > We had only listed ESX 6.5 in the docs you probably saw, and in 6.7 > the > > value of action_OnRetryErrors defaulted to on instead of off. You > should > > set this back to off. > > > > You should also upgrade to the current version of tcmu-runner 1.5.x. > It > > should fix the issue you are hitting, so non IO commands like > inquiry, > > RTPG, etc are executed while failing over/back, so you would not hit > the > > problem where path initialization and path testing IO is failed > causing > > the path to marked as failed. > > > > > > I updated tcmu-runner to 1.5.2, and change action_OnRetryErrors to off, > > but the problem continue 😭 > > > > Attached is vmkernel.log. > > > > > When you stopped the iscsi gw at around 2020-02-09T01:51:25.820Z, how > many paths did your device have? Did: > > esxcli storage nmp path list -d your_device > > report only one path? Did > > esxcli iscsi session connection list > > show a iscsi connection to each gw? > > Hmmm, I believe the problem may be here. I verified that I was listing only one GW for each path. So I ran a "rescan HBA" on VMware on both ESX, now one of them lists the 3 (I added one more) gateways, but an ESX host with the same configuration continues to list only one gateway. See the different outputs: [root@tcnvh7:~] esxcli iscsi session connection list vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000001,0 Adapter: vmhba68 Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw ISID: 00023d000001 CID: 0 DataDigest: NONE HeaderDigest: NONE IFMarker: false IFMarkerInterval: 0 MaxRecvDataSegmentLength: 131072 MaxTransmitDataSegmentLength: 262144 OFMarker: false OFMarkerInterval: 0 ConnectionAddress: 192.168.201.1 RemoteAddress: 192.168.201.1 LocalAddress: 192.168.201.107 SessionCreateTime: 01/19/20 00:11:25 ConnectionCreateTime: 01/19/20 00:11:25 ConnectionStartTime: 02/13/20 23:03:10 State: logged_in vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000002,0 Adapter: vmhba68 Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw ISID: 00023d000002 CID: 0 DataDigest: NONE HeaderDigest: NONE IFMarker: false IFMarkerInterval: 0 MaxRecvDataSegmentLength: 131072 MaxTransmitDataSegmentLength: 262144 OFMarker: false OFMarkerInterval: 0 ConnectionAddress: 192.168.201.2 RemoteAddress: 192.168.201.2 LocalAddress: 192.168.201.107 SessionCreateTime: 02/13/20 23:09:16 ConnectionCreateTime: 02/13/20 23:09:16 ConnectionStartTime: 02/13/20 23:09:16 State: logged_in vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000003,0 Adapter: vmhba68 Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw ISID: 00023d000003 CID: 0 DataDigest: NONE HeaderDigest: NONE IFMarker: false IFMarkerInterval: 0 MaxRecvDataSegmentLength: 131072 MaxTransmitDataSegmentLength: 262144 OFMarker: false OFMarkerInterval: 0 ConnectionAddress: 192.168.201.3 RemoteAddress: 192.168.201.3 LocalAddress: 192.168.201.107 SessionCreateTime: 02/13/20 23:09:16 ConnectionCreateTime: 02/13/20 23:09:16 ConnectionStartTime: 02/13/20 23:09:16 State: logged_in ===== [root@tcnvh8:~] esxcli iscsi session connection list vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000001,0 Adapter: vmhba68 Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw ISID: 00023d000001 CID: 0 DataDigest: NONE HeaderDigest: NONE IFMarker: false IFMarkerInterval: 0 MaxRecvDataSegmentLength: 131072 MaxTransmitDataSegmentLength: 262144 OFMarker: false OFMarkerInterval: 0 ConnectionAddress: 192.168.201.1 RemoteAddress: 192.168.201.1 LocalAddress: 192.168.201.108 SessionCreateTime: 01/12/20 02:53:53 ConnectionCreateTime: 01/12/20 02:53:53 ConnectionStartTime: 02/13/20 23:06:40 State: logged_in Is that the problem? Any ideas on how to proceed from here? The logs look like when you brought the gw down, we lost the only path > we had. We then went into all paths down, so IO could not execute. It > looks like the gw was brought back up at the end of the log and the path > seem to have got added back. > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx