On 02/14/2020 10:25 AM, Mike Christie wrote: > On 02/13/2020 08:52 PM, Gesiel Galvão Bernardes wrote: >> Hi >> >> Em dom., 9 de fev. de 2020 às 18:27, Mike Christie <mchristi@xxxxxxxxxx >> <mailto:mchristi@xxxxxxxxxx>> escreveu: >> >> On 02/08/2020 11:34 PM, Gesiel Galvão Bernardes wrote: >> > Hi, >> > >> > Em qui., 6 de fev. de 2020 às 18:56, Mike Christie >> <mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx> >> > <mailto:mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>>> escreveu: >> > >> > On 02/05/2020 07:03 AM, Gesiel Galvão Bernardes wrote: >> > > Em dom., 2 de fev. de 2020 às 00:37, Gesiel Galvão Bernardes >> > > <gesiel.bernardes@xxxxxxxxx >> <mailto:gesiel.bernardes@xxxxxxxxx> >> <mailto:gesiel.bernardes@xxxxxxxxx <mailto:gesiel.bernardes@xxxxxxxxx>> >> > <mailto:gesiel.bernardes@xxxxxxxxx >> <mailto:gesiel.bernardes@xxxxxxxxx> >> > <mailto:gesiel.bernardes@xxxxxxxxx >> <mailto:gesiel.bernardes@xxxxxxxxx>>>> escreveu: >> > > >> > > Hi, >> > > >> > > Just now was possible continue this. Below is the >> information >> > > required. Thanks advan >> > >> > >> > Hey, sorry for the late reply. I just back from PTO. >> > >> > > >> > > esxcli storage nmp device list -d >> > naa.6001405ba48e0b99e4c418ca13506c8e >> > > naa.6001405ba48e0b99e4c418ca13506c8e >> > > Device Display Name: LIO-ORG iSCSI Disk >> > > (naa.6001405ba48e0b99e4c418ca13506c8e) >> > > Storage Array Type: VMW_SATP_ALUA >> > > Storage Array Type Device Config: {implicit_support=on; >> > > explicit_support=off; explicit_allow=on; alua_followover=on; >> > > action_OnRetryErrors=on; {TPG_id=1,TPG_state=ANO}} >> > > Path Selection Policy: VMW_PSP_MRU >> > > Path Selection Policy Device Config: Current >> > Path=vmhba68:C0:T0:L0 >> > > Path Selection Policy Device Custom Config: >> > > Working Paths: vmhba68:C0:T0:L0 >> > > Is USB: false >> > >> > ........ >> > >> > > Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x4 0xa. >> > Act:FAILOVER >> > >> > >> > Are you sure you are using tcmu-runner 1.4? Is that the actual >> daemon >> > reversion running? Did you by any chance install the 1.4 rpm, >> but you/it >> > did not restart the daemon? The error code above is returned >> in 1.3 and >> > earlier. >> > >> > You are probably hitting a combo of 2 issues. >> > >> > We had only listed ESX 6.5 in the docs you probably saw, and >> in 6.7 the >> > value of action_OnRetryErrors defaulted to on instead of off. >> You should >> > set this back to off. >> > >> > You should also upgrade to the current version of tcmu-runner >> 1.5.x. It >> > should fix the issue you are hitting, so non IO commands like >> inquiry, >> > RTPG, etc are executed while failing over/back, so you would >> not hit the >> > problem where path initialization and path testing IO is >> failed causing >> > the path to marked as failed. >> > >> > >> > I updated tcmu-runner to 1.5.2, and change action_OnRetryErrors to >> off, >> > but the problem continue 😭 >> > >> > Attached is vmkernel.log. >> > >> >> >> When you stopped the iscsi gw at around 2020-02-09T01:51:25.820Z, how >> many paths did your device have? Did: >> >> esxcli storage nmp path list -d your_device >> >> report only one path? Did >> >> esxcli iscsi session connection list >> >> show a iscsi connection to each gw? >> >> Hmmm, I believe the problem may be here. I verified that I was listing >> only one GW for each path. So I ran a "rescan HBA" on VMware on both >> ESX, now one of them lists the 3 (I added one more) gateways, but an ESX >> host with the same configuration continues to list only one gateway. See >> the different outputs: >> >> [root@tcnvh7:~] esxcli iscsi session connection list >> vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000001,0 >> Adapter: vmhba68 >> Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw >> ISID: 00023d000001 >> CID: 0 >> DataDigest: NONE >> HeaderDigest: NONE >> IFMarker: false >> IFMarkerInterval: 0 >> MaxRecvDataSegmentLength: 131072 >> MaxTransmitDataSegmentLength: 262144 >> OFMarker: false >> OFMarkerInterval: 0 >> ConnectionAddress: 192.168.201.1 >> RemoteAddress: 192.168.201.1 >> LocalAddress: 192.168.201.107 >> SessionCreateTime: 01/19/20 00:11:25 >> ConnectionCreateTime: 01/19/20 00:11:25 >> ConnectionStartTime: 02/13/20 23:03:10 >> State: logged_in >> >> vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000002,0 >> Adapter: vmhba68 >> Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw >> ISID: 00023d000002 >> CID: 0 >> DataDigest: NONE >> HeaderDigest: NONE >> IFMarker: false >> IFMarkerInterval: 0 >> MaxRecvDataSegmentLength: 131072 >> MaxTransmitDataSegmentLength: 262144 >> OFMarker: false >> OFMarkerInterval: 0 >> ConnectionAddress: 192.168.201.2 >> RemoteAddress: 192.168.201.2 >> LocalAddress: 192.168.201.107 >> SessionCreateTime: 02/13/20 23:09:16 >> ConnectionCreateTime: 02/13/20 23:09:16 >> ConnectionStartTime: 02/13/20 23:09:16 >> State: logged_in >> >> vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000003,0 >> Adapter: vmhba68 >> Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw >> ISID: 00023d000003 >> CID: 0 >> DataDigest: NONE >> HeaderDigest: NONE >> IFMarker: false >> IFMarkerInterval: 0 >> MaxRecvDataSegmentLength: 131072 >> MaxTransmitDataSegmentLength: 262144 >> OFMarker: false >> OFMarkerInterval: 0 >> ConnectionAddress: 192.168.201.3 >> RemoteAddress: 192.168.201.3 >> LocalAddress: 192.168.201.107 >> SessionCreateTime: 02/13/20 23:09:16 >> ConnectionCreateTime: 02/13/20 23:09:16 >> ConnectionStartTime: 02/13/20 23:09:16 >> State: logged_in >> >> ===== >> [root@tcnvh8:~] esxcli iscsi session connection list >> vmhba68,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,00023d000001,0 >> Adapter: vmhba68 >> Target: iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw >> ISID: 00023d000001 >> CID: 0 >> DataDigest: NONE >> HeaderDigest: NONE >> IFMarker: false >> IFMarkerInterval: 0 >> MaxRecvDataSegmentLength: 131072 >> MaxTransmitDataSegmentLength: 262144 >> OFMarker: false >> OFMarkerInterval: 0 >> ConnectionAddress: 192.168.201.1 >> RemoteAddress: 192.168.201.1 >> LocalAddress: 192.168.201.108 >> SessionCreateTime: 01/12/20 02:53:53 >> ConnectionCreateTime: 01/12/20 02:53:53 >> ConnectionStartTime: 02/13/20 23:06:40 >> State: logged_in >> >> Is that the problem? Any ideas on how to proceed from here? >> > > Yes. Normally, you would have the connection already created, and when > one path/gateway goes down, then the multipath layer will switch to > another path. When the path/gateway comes back up, the initiator side's > iscsi layer will reconnect automatically and the multipath layer will > re-setup the path structure, so it can failback if its a higher priority > path or failover later if other paths go down. > > Something happened with the automatic path connection process on that > node. We know it works for that one gateway you brought up/down. For the > other gateways I would check: > > 1. Check that all target portals are being discovered. In the GUI screen > you entered in the discovery address, you should also see a list of all > target portals that were found in the static section. Do you only see 1 > portal? > > See here: > > https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.storage.doc/GUID-66215AF3-2D81-4D1F-92D4-B9623FC1CB0E.html > Oh yeah, make sure you check the basics. If after a rescan you are seeing only the one portal at 192.168.201.1, then make sure from tcnvh8 you can ping the other addresses 192.168.201.3 and 192.168.201.2. > 2. If you see all the portals then when you hit the rescan HBA button, > do you see any errors on the target side in /var/log/messages? Maybe > something about CHAP/login/auth errors? > > What about in the /var/log/vmkernel.log on the initiator side? Any iscsi > errors? > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx