Hi, Do you have any suggestions on where I can look? Regards, Gesiel Em dom., 2 de fev. de 2020 às 00:37, Gesiel Galvão Bernardes < gesiel.bernardes@xxxxxxxxx> escreveu: > Hi, > > Just now was possible continue this. Below is the information required. > Thanks advance, > > Gesiel > > Em seg., 20 de jan. de 2020 às 15:06, Mike Christie <mchristi@xxxxxxxxxx> > escreveu: > >> On 01/20/2020 10:29 AM, Gesiel Galvão Bernardes wrote: >> > Hi, >> > >> > Only now have I been able to act on this problem. My environment is >> > relatively simple: I have two ESXi 6.7 hosts, connected to two ISCSI >> > gateways, using two RBD images. >> > >> > When this mail started, the workaround was to keep only one ISCSI >> > gateway connected, so it works normally. After the answer that received >> > here that the problem could be in the configuration of VMWare, I >> > reviewed the configuration of both (they are exactly according to Ceph >> > documentation), and rebooted both. >> > >> >> Can you give me some basic info. >> >> The output of: >> # "gwcli ls" >> > > > o- / > ......................................................................................................... > [...] > o- cluster > ......................................................................................... > [Clusters: 1] > | o- ceph > ............................................................................................ > [HEALTH_OK] > | o- pools > .......................................................................................... > [Pools: 7] > | | o- data_ecpool ................................. [(x3), Commit: > 0.00Y/25836962M (0%), Used: 25011763551408b] > | | o- pool1 ....................................... [(x2), Commit: > 5.0T/38755444M (13%), Used: 16162480333070b] > | | o- pool2 ....................................... [(x3), Commit: > 6.0T/25836962M (24%), Used: 24261305669809b] > | | o- pool3 ........................................ [(4+0), Commit: > 0.00Y/58133164M (0%), Used: 719866819603b] > | | o- pool_cache ...................................... [(x3), > Commit: 0.00Y/706164M (0%), Used: 422549509342b] > | | o- poolfs ................................................ [(x3), > Commit: 0.00Y/25836962M (0%), Used: 0.00Y] > | | o- rbd ................................................... [(x3), > Commit: 0.00Y/25836962M (0%), Used: 5737b] > | o- topology > ............................................................................... > [OSDs: 61,MONs: 2] > o- disks > ....................................................................................... > [11.0T, Disks: 3] > | o- pool1 > ........................................................................................ > [pool1 (5.0T)] > | | o- vmware_iscsi1 > ................................................................ > [pool1/vmware_iscsi1 (5.0T)] > | o- pool2 > ........................................................................................ > [pool2 (6.0T)] > | o- iscsi-test > ...................................................................... > [pool2/iscsi-test (1.0T)] > | o- vmware_iscsi2 > ................................................................ > [pool2/vmware_iscsi2 (5.0T)] > o- iscsi-targets > ............................................................... > [DiscoveryAuth: None, Targets: 1] > o- iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw > ......................................... [Auth: None, Gateways: 2] > o- disks > .......................................................................................... > [Disks: 3] > | o- pool1/vmware_iscsi1 > ................................................................ [Owner: > ceph-iscsi2] > | o- pool2/iscsi-test > ................................................................... [Owner: > ceph-iscsi2] > | o- pool2/vmware_iscsi2 > ................................................................ [Owner: > ceph-iscsi1] > o- gateways > ............................................................................ > [Up: 2/2, Portals: 2] > | o- ceph-iscsi1 > ........................................................................ > [192.168.201.1 (UP)] > | o- ceph-iscsi2 > ........................................................................ > [192.168.201.2 (UP)] > o- host-groups > .................................................................................. > [Groups : 0] > o- hosts > ....................................................................... > [Auth: ACL_ENABLED, Hosts: 2] > o- iqn.1994-05.com.redhat:rh7-client > .............................. [LOGGED-IN, Auth: CHAP, Disks: 2(10.0T)] > | o- lun 0 ................................................. > [pool1/vmware_iscsi1(5.0T), Owner: ceph-iscsi2] > | o- lun 1 ................................................. > [pool2/vmware_iscsi2(5.0T), Owner: ceph-iscsi1] > o- iqn.1994-05.com.redhat:tcnvh8 > .................................. [LOGGED-IN, Auth: CHAP, Disks: 2(10.0T)] > o- lun 0 ................................................. > [pool1/vmware_iscsi1(5.0T), Owner: ceph-iscsi2] > o- lun 1 ................................................. > [pool2/vmware_iscsi2(5.0T), Owner: ceph-iscsi1] >> >> from one of the iscsi nodes, and give me the output of: >> >> # targetcli ls >> > > ceph-iscsi1 ~]# targetcli ls > Warning: Could not load preferences file /root/.targetcli/prefs.bin. > o- / > ......................................................................................................... > [...] > o- backstores > .............................................................................................. > [...] > | o- block > .................................................................................. > [Storage Objects: 0] > | o- fileio > ................................................................................. > [Storage Objects: 0] > | o- pscsi > .................................................................................. > [Storage Objects: 0] > | o- ramdisk > ................................................................................ > [Storage Objects: 0] > | o- user:glfs > .............................................................................. > [Storage Objects: 0] > | o- user:qcow > .............................................................................. > [Storage Objects: 0] > | o- user:rbd > ............................................................................... > [Storage Objects: 3] > | | o- pool1.vmware_iscsi1 ............................ > [pool1/vmware_iscsi1;osd_op_timeout=30 (5.0TiB) activated] > | | | o- alua > ................................................................................... > [ALUA Groups: 3] > | | | o- ano1 > ............................................................... [ALUA > state: Active/non-optimized] > | | | o- ao > ..................................................................... [ALUA > state: Active/optimized] > | | | o- default_tg_pt_gp > ....................................................... [ALUA state: > Active/optimized] > | | o- pool2.iscsi-test .................................. > [pool2/iscsi-test;osd_op_timeout=30 (1.0TiB) activated] > | | | o- alua > ................................................................................... > [ALUA Groups: 3] > | | | o- ano1 > ............................................................... [ALUA > state: Active/non-optimized] > | | | o- ao > ..................................................................... [ALUA > state: Active/optimized] > | | | o- default_tg_pt_gp > ....................................................... [ALUA state: > Active/optimized] > | | o- pool2.vmware_iscsi2 ............................ > [pool2/vmware_iscsi2;osd_op_timeout=30 (5.0TiB) activated] > | | o- alua > ................................................................................... > [ALUA Groups: 3] > | | o- ano2 > ............................................................... [ALUA > state: Active/non-optimized] > | | o- ao > ..................................................................... [ALUA > state: Active/optimized] > | | o- default_tg_pt_gp > ....................................................... [ALUA state: > Active/optimized] > | o- user:zbc > ............................................................................... > [Storage Objects: 0] > o- iscsi > ............................................................................................ > [Targets: 1] > | o- iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw > ......................................................... [TPGs: 2] > | o- tpg1 > .......................................................................... > [no-gen-acls, auth per-acl] > | | o- acls > .......................................................................................... > [ACLs: 2] > | | | o- iqn.1994-05.com.redhat:rh7-client > ........................................ [1-way auth, Mapped LUNs: 2] > | | | | o- mapped_lun0 > .................................................... [lun0 > user/pool1.vmware_iscsi1 (rw)] > | | | | o- mapped_lun1 > .................................................... [lun2 > user/pool2.vmware_iscsi2 (rw)] > | | | o- iqn.1994-05.com.redhat:tcnvh8 > ............................................ [1-way auth, Mapped LUNs: 2] > | | | o- mapped_lun0 > .................................................... [lun0 > user/pool1.vmware_iscsi1 (rw)] > | | | o- mapped_lun1 > .................................................... [lun2 > user/pool2.vmware_iscsi2 (rw)] > | | o- luns > .......................................................................................... > [LUNs: 3] > | | | o- lun0 > ................................................................ > [user/pool1.vmware_iscsi1 (ano1)] > | | | o- lun1 > ................................................................... > [user/pool2.iscsi-test (ano1)] > | | | o- lun2 > .................................................................. > [user/pool2.vmware_iscsi2 (ao)] > | | o- portals > .................................................................................... > [Portals: 1] > | | o- 192.168.201.1:3260 > ............................................................................... > [OK] > | o- tpg2 > ........................................................................................... > [disabled] > | o- acls > .......................................................................................... > [ACLs: 0] > | o- luns > .......................................................................................... > [LUNs: 3] > | | o- lun0 > .................................................................. > [user/pool1.vmware_iscsi1 (ao)] > | | o- lun1 > ..................................................................... > [user/pool2.iscsi-test (ao)] > | | o- lun2 > ................................................................ > [user/pool2.vmware_iscsi2 (ano2)] > | o- portals > .................................................................................... > [Portals: 1] > | o- 192.168.201.2:3260 > ............................................................................... > [OK] > o- loopback > ......................................................................................... > [Targets: 0] > > > ceph-iscsi2 ~]# targetcli ls > Warning: Could not load preferences file /root/.targetcli/prefs.bin. > o- / > ......................................................................................................... > [...] > o- backstores > .............................................................................................. > [...] > | o- block > .................................................................................. > [Storage Objects: 0] > | o- fileio > ................................................................................. > [Storage Objects: 0] > | o- pscsi > .................................................................................. > [Storage Objects: 0] > | o- ramdisk > ................................................................................ > [Storage Objects: 0] > | o- user:glfs > .............................................................................. > [Storage Objects: 0] > | o- user:qcow > .............................................................................. > [Storage Objects: 0] > | o- user:rbd > ............................................................................... > [Storage Objects: 3] > | | o- pool1.vmware_iscsi1 ............................ > [pool1/vmware_iscsi1;osd_op_timeout=30 (5.0TiB) activated] > | | | o- alua > ................................................................................... > [ALUA Groups: 3] > | | | o- ano1 > ............................................................... [ALUA > state: Active/non-optimized] > | | | o- ao > ..................................................................... [ALUA > state: Active/optimized] > | | | o- default_tg_pt_gp > ....................................................... [ALUA state: > Active/optimized] > | | o- pool2.iscsi-test .................................. > [pool2/iscsi-test;osd_op_timeout=30 (1.0TiB) activated] > | | | o- alua > ................................................................................... > [ALUA Groups: 3] > | | | o- ano1 > ............................................................... [ALUA > state: Active/non-optimized] > | | | o- ao > ..................................................................... [ALUA > state: Active/optimized] > | | | o- default_tg_pt_gp > ....................................................... [ALUA state: > Active/optimized] > | | o- pool2.vmware_iscsi2 ............................ > [pool2/vmware_iscsi2;osd_op_timeout=30 (5.0TiB) activated] > | | o- alua > ................................................................................... > [ALUA Groups: 3] > | | o- ano2 > ............................................................... [ALUA > state: Active/non-optimized] > | | o- ao > ..................................................................... [ALUA > state: Active/optimized] > | | o- default_tg_pt_gp > ....................................................... [ALUA state: > Active/optimized] > | o- user:zbc > ............................................................................... > [Storage Objects: 0] > o- iscsi > ............................................................................................ > [Targets: 1] > | o- iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw > ......................................................... [TPGs: 2] > | o- tpg1 > ........................................................................................... > [disabled] > | | o- acls > .......................................................................................... > [ACLs: 0] > | | o- luns > .......................................................................................... > [LUNs: 3] > | | | o- lun0 > ................................................................ > [user/pool1.vmware_iscsi1 (ano1)] > | | | o- lun1 > ................................................................... > [user/pool2.iscsi-test (ano1)] > | | | o- lun2 > .................................................................. > [user/pool2.vmware_iscsi2 (ao)] > | | o- portals > .................................................................................... > [Portals: 1] > | | o- 192.168.201.1:3260 > ............................................................................... > [OK] > | o- tpg2 > .......................................................................... > [no-gen-acls, auth per-acl] > | o- acls > .......................................................................................... > [ACLs: 2] > | | o- iqn.1994-05.com.redhat:rh7-client > ........................................ [1-way auth, Mapped LUNs: 2] > | | | o- mapped_lun0 > .................................................... [lun0 > user/pool1.vmware_iscsi1 (rw)] > | | | o- mapped_lun1 > .................................................... [lun2 > user/pool2.vmware_iscsi2 (rw)] > | | o- iqn.1994-05.com.redhat:tcnvh8 > ............................................ [1-way auth, Mapped LUNs: 2] > | | o- mapped_lun0 > .................................................... [lun0 > user/pool1.vmware_iscsi1 (rw)] > | | o- mapped_lun1 > .................................................... [lun2 > user/pool2.vmware_iscsi2 (rw)] > | o- luns > .......................................................................................... > [LUNs: 3] > | | o- lun0 > .................................................................. > [user/pool1.vmware_iscsi1 (ao)] > | | o- lun1 > ..................................................................... > [user/pool2.iscsi-test (ao)] > | | o- lun2 > ................................................................ > [user/pool2.vmware_iscsi2 (ano2)] > | o- portals > .................................................................................... > [Portals: 1] > | o- 192.168.201.2:3260 > ............................................................................... > [OK] > o- loopback > ......................................................................................... > [Targets: 0] > > > >> from both iscsi nodes. >> >> The ceph-iscsi and tcmu-runner versions and did you build them yourself, >> get them from the ceph repos or get it from a distro repo. >> > > I use in both gateways: ceph-iscsi-3.3-1.el7.noarch from ceph-iscsi repo ( > http://download.ceph.com/ceph-iscsi/3/rpm/el7/noarch) > tcmu-runner-1.4.0-0.1.51.geef5115.el7.x86_64 from > https://3.chacra.ceph.com/r/tcmu-runner/master/eef511565078fb4e2ed52caaff16e6c7e75ed6c3/centos/7/flavors/default/x86_64/tcmu-runner-1.4.0-0.1.51.geef5115.el7.x86_64.rpm > > >> >> > It turns out, now the gateway "ceph-iscsi1" is working. When I turn on >> > "ceph-iscsi2" it appears as "Active / Not optimized" for the two RBD >> > images (before was an Active / Optimized for each image), and if I turn >> > off "ceph-iscsi1" it (ceph-iscs2) remains as "Active / No optimized "and >> > images are unavailable. >> >> On the ESX side can you give me the output of: >> >> esxcli storage nmp path list -d disk_id >> > esxcli storage nmp path list -d naa.6001405ba48e0b99e4c418ca13506c8e > > iqn.1994-05.com.redhat:rh7-client-00023d000001,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,t,1-naa.6001405ba48e0b99e4c418ca13506c8e > Runtime Name: vmhba68:C0:T0:L0 > Device: naa.6001405ba48e0b99e4c418ca13506c8e > Device Display Name: LIO-ORG iSCSI Disk > (naa.6001405ba48e0b99e4c418ca13506c8e) > Group State: active unoptimized > Array Priority: 0 > Storage Array Type Path Config: > {TPG_id=1,TPG_state=ANO,RTP_id=1,RTP_health=UP} > Path Selection Policy Path Config: {current path; rank: 0} > > > >> esxcli storage core device list -d disk_id >> > :~] esxcli storage core device list -d naa.6001405ba48e0b99e4c418ca13506c8e > naa.6001405ba48e0b99e4c418ca13506c8e > Display Name: LIO-ORG iSCSI Disk (naa.6001405ba48e0b99e4c418ca13506c8e) > Has Settable Display Name: true > Size: 5242880 > Device Type: Direct-Access > Multipath Plugin: NMP > Devfs Path: /vmfs/devices/disks/naa.6001405ba48e0b99e4c418ca13506c8e > Vendor: LIO-ORG > Model: TCMU device > Revision: 0002 > SCSI Level: 5 > Is Pseudo: false > Status: degraded > Is RDM Capable: true > Is Local: false > Is Removable: false > Is SSD: false > Is VVOL PE: false > Is Offline: false > Is Perennially Reserved: false > Queue Full Sample Size: 0 > Queue Full Threshold: 0 > Thin Provisioning Status: yes > Attached Filters: > VAAI Status: supported > Other UIDs: vml.02000000006001405ba48e0b99e4c418ca13506c8e54434d552064 > Is Shared Clusterwide: true > Is SAS: false > Is USB: false > Is Boot Device: false > Device Max Queue Depth: 128 > No of outstanding IOs with competing worlds: 32 > Drive Type: unknown > RAID Level: unknown > Number of Physical Drives: unknown > Protection Enabled: false > PI Activated: false > PI Type: 0 > PI Protection Mask: NO PROTECTION > Supported Guard Types: NO GUARD SUPPORT > DIX Enabled: false > DIX Guard Type: NO GUARD SUPPORT > Emulated DIX/DIF Enabled: false > > >> esxcli storage nmp device list -d disk_id >> > esxcli storage nmp device list -d naa.6001405ba48e0b99e4c418ca13506c8e > naa.6001405ba48e0b99e4c418ca13506c8e > Device Display Name: LIO-ORG iSCSI Disk > (naa.6001405ba48e0b99e4c418ca13506c8e) > Storage Array Type: VMW_SATP_ALUA > Storage Array Type Device Config: {implicit_support=on; > explicit_support=off; explicit_allow=on; alua_followover=on; > action_OnRetryErrors=on; {TPG_id=1,TPG_state=ANO}} > Path Selection Policy: VMW_PSP_MRU > Path Selection Policy Device Config: Current Path=vmhba68:C0:T0:L0 > Path Selection Policy Device Custom Config: > Working Paths: vmhba68:C0:T0:L0 > Is USB: false > > >> esxcli storage nmp satp list >> >> esxcli storage nmp satp list > Name Default PSP Description > ------------------- ------------- > -------------------------------------------------------------------------------- > VMW_SATP_ALUA VMW_PSP_MRU Supports non-specific arrays that use > the ALUA protocol > VMW_SATP_MSA VMW_PSP_MRU Placeholder (plugin not loaded) > VMW_SATP_DEFAULT_AP VMW_PSP_MRU Placeholder (plugin not loaded) > VMW_SATP_SVC VMW_PSP_FIXED Placeholder (plugin not loaded) > VMW_SATP_EQL VMW_PSP_FIXED Placeholder (plugin not loaded) > VMW_SATP_INV VMW_PSP_FIXED Placeholder (plugin not loaded) > VMW_SATP_EVA VMW_PSP_FIXED Placeholder (plugin not loaded) > VMW_SATP_ALUA_CX VMW_PSP_RR Placeholder (plugin not loaded) > VMW_SATP_SYMM VMW_PSP_RR Placeholder (plugin not loaded) > VMW_SATP_CX VMW_PSP_MRU Placeholder (plugin not loaded) > VMW_SATP_LSI VMW_PSP_MRU Placeholder (plugin not loaded) > VMW_SATP_DEFAULT_AA VMW_PSP_FIXED Supports non-specific active/active > arrays > VMW_SATP_LOCAL VMW_PSP_FIXED Supports direct attached devices > > > >> and the /var/log/vmkernel.log for when you stop a node and the image >> goes to the unavailable state. >> >> >> 2020-02-02T00:14:59.094Z cpu2:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:640: Retry world failover device > "naa.6001405ba48e0b99e4c418ca13506c8e" - issuing command 0x459a9abd2380 > 2020-02-02T00:14:59.095Z cpu2:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:715: Retry world failover device > "naa.6001405ba48e0b99e4c418ca13506c8e" - failed to issue command due to Not > found (APD), try again... > 2020-02-02T00:14:59.095Z cpu2:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:765: Logical device > "naa.6001405ba48e0b99e4c418ca13506c8e": awaiting fast path state update... > 2020-02-02T00:15:01.095Z cpu2:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:640: Retry world failover device > "naa.6001405cb71b882378c4138826c2da30" - issuing command 0x45a283482940 > 2020-02-02T00:15:01.095Z cpu2:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:715: Retry world failover device > "naa.6001405cb71b882378c4138826c2da30" - failed to issue command due to Not > found (APD), try again... > 2020-02-02T00:15:01.095Z cpu2:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:765: Logical device > "naa.6001405cb71b882378c4138826c2da30": awaiting fast path state update... > 2020-02-02T00:15:06.979Z cpu25:2124996)ScsiDeviceIO: 3449: > Cmd(0x459a9abd2380) 0x9e, CmdSN 0x1dc0cf from world 0 to dev > "naa.6001405ba48e0b99e4c418ca13506c8e" failed H:0x5 D:0x0 P:0x0 Invalid > sense data: 0x46 0x80 0x41. > 2020-02-02T00:15:06.979Z cpu25:2124996)WARNING: NMP: > nmp_DeviceStartLoop:729: NMP Device "naa.6001405ba48e0b99e4c418ca13506c8e" > is blocked. Not starting I/O from device. > 2020-02-02T00:15:07.094Z cpu2:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:603: Retry world restore device > "naa.6001405ba48e0b99e4c418ca13506c8e" - no more commands to retry > 2020-02-02T00:15:07.095Z cpu33:2098078)WARNING: NMP: > nmp_IssueCommandToDevice:5726: I/O could not be issued to device > "naa.6001405ba48e0b99e4c418ca13506c8e" due to Not found > 2020-02-02T00:15:07.095Z cpu33:2098078)WARNING: NMP: > nmp_DeviceRetryCommand:133: Device "naa.6001405ba48e0b99e4c418ca13506c8e": > awaiting fast path state update for failover with I/O blocked. No prior > reservation exists on the device. > 2020-02-02T00:15:07.095Z cpu33:2098078)WARNING: NMP: > nmp_DeviceStartLoop:729: NMP Device "naa.6001405ba48e0b99e4c418ca13506c8e" > is blocked. Not starting I/O from device. > 2020-02-02T00:15:08.095Z cpu2:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:640: Retry world failover device > "naa.6001405ba48e0b99e4c418ca13506c8e" - issuing command 0x459a9abd2380 > 2020-02-02T00:15:08.095Z cpu2:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:715: Retry world failover device > "naa.6001405ba48e0b99e4c418ca13506c8e" - failed to issue command due to Not > found (APD), try again... > 2020-02-02T00:15:08.095Z cpu2:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:765: Logical device > "naa.6001405ba48e0b99e4c418ca13506c8e": awaiting fast path state update... > 2020-02-02T00:15:08.154Z cpu25:2124996)ScsiDeviceIO: 3449: > Cmd(0x45a283482940) 0x12, CmdSN 0x1dc10c from world 0 to dev > "naa.6001405cb71b882378c4138826c2da30" failed H:0x5 D:0x0 P:0x0 Invalid > sense data: 0x46 0x80 0x41. > 2020-02-02T00:15:08.154Z cpu25:2124996)WARNING: NMP: > nmp_DeviceStartLoop:729: NMP Device "naa.6001405cb71b882378c4138826c2da30" > is blocked. Not starting I/O from device. > 2020-02-02T00:15:09.095Z cpu2:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:640: Retry world failover device > "naa.6001405ba48e0b99e4c418ca13506c8e" - issuing command 0x459a9abd2380 > 2020-02-02T00:15:09.095Z cpu2:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:715: Retry world failover device > "naa.6001405ba48e0b99e4c418ca13506c8e" - failed to issue command due to Not > found (APD), try again... > 2020-02-02T00:15:09.095Z cpu2:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:765: Logical device > "naa.6001405ba48e0b99e4c418ca13506c8e": awaiting fast path state update... > 2020-02-02T00:15:09.095Z cpu2:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:603: Retry world restore device > "naa.6001405cb71b882378c4138826c2da30" - no more commands to retry > 2020-02-02T00:15:14.094Z cpu22:2098078)NMP: > nmp_ResetDeviceLogThrottling:3575: last error status from device > naa.6001405cb71b882378c4138826c2da30 repeated 35 times > 2020-02-02T00:15:14.095Z cpu22:2098078)NMP: > nmp_ResetDeviceLogThrottling:3575: last error status from device > naa.6001405ba48e0b99e4c418ca13506c8e repeated 8 times > 2020-02-02T00:15:39.094Z cpu27:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:640: Retry world failover device > "naa.6001405ba48e0b99e4c418ca13506c8e" - issuing command 0x459a9abd2380 > 2020-02-02T00:15:39.095Z cpu27:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:715: Retry world failover device > "naa.6001405ba48e0b99e4c418ca13506c8e" - failed to issue command due to Not > found (APD), try again... > 2020-02-02T00:15:39.095Z cpu27:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:765: Logical device > "naa.6001405ba48e0b99e4c418ca13506c8e": awaiting fast path state update... > 2020-02-02T00:15:46.981Z cpu25:2124996)ScsiDeviceIO: 3449: > Cmd(0x459a9abd2380) 0x25, CmdSN 0x1dc10d from world 0 to dev > "naa.6001405ba48e0b99e4c418ca13506c8e" failed H:0x5 D:0x0 P:0x0 Invalid > sense data: 0x0 0x0 0x0. > 2020-02-02T00:15:46.981Z cpu25:2124996)WARNING: NMP: > nmp_DeviceStartLoop:729: NMP Device "naa.6001405ba48e0b99e4c418ca13506c8e" > is blocked. Not starting I/O from device. > 2020-02-02T00:15:47.094Z cpu20:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:603: Retry world restore device > "naa.6001405ba48e0b99e4c418ca13506c8e" - no more commands to retry > 2020-02-02T00:15:47.095Z cpu22:2098078)WARNING: NMP: > nmp_IssueCommandToDevice:5726: I/O could not be issued to device > "naa.6001405ba48e0b99e4c418ca13506c8e" due to Not found > 2020-02-02T00:15:47.095Z cpu22:2098078)WARNING: NMP: > nmp_DeviceRetryCommand:133: Device "naa.6001405ba48e0b99e4c418ca13506c8e": > awaiting fast path state update for failover with I/O blocked. No prior > reservation exists on the device. > 2020-02-02T00:15:47.095Z cpu22:2098078)WARNING: NMP: > nmp_DeviceStartLoop:729: NMP Device "naa.6001405ba48e0b99e4c418ca13506c8e" > is blocked. Not starting I/O from device. > 2020-02-02T00:15:48.094Z cpu20:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:640: Retry world failover device > "naa.6001405ba48e0b99e4c418ca13506c8e" - issuing command 0x459a9ba62b40 > 2020-02-02T00:15:48.095Z cpu20:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:715: Retry world failover device > "naa.6001405ba48e0b99e4c418ca13506c8e" - failed to issue command due to Not > found (APD), try again... > 2020-02-02T00:15:48.095Z cpu20:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:765: Logical device > "naa.6001405ba48e0b99e4c418ca13506c8e": awaiting fast path state update... > 2020-02-02T00:15:49.095Z cpu20:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:640: Retry world failover device > "naa.6001405ba48e0b99e4c418ca13506c8e" - issuing command 0x459a9ba62b40 > 2020-02-02T00:15:49.095Z cpu20:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:715: Retry world failover device > "naa.6001405ba48e0b99e4c418ca13506c8e" - failed to issue command due to Not > found (APD), try again... > 2020-02-02T00:15:49.095Z cpu20:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:765: Logical device > "naa.6001405ba48e0b99e4c418ca13506c8e": awaiting fast path state update... > 2020-02-02T00:15:52.967Z cpu0:2098255)iscsi_vmk: > iscsivmk_ConnNetRegister:2170: socket 0x430e49d44a90 network resource pool > netsched.pools.persist.iscsi associated > 2020-02-02T00:15:52.967Z cpu0:2098255)iscsi_vmk: > iscsivmk_ConnNetRegister:2198: socket 0x430e49d44a90 network tracker id > 193945594 tracker.iSCSI.192.168.201.1 associated > 2020-02-02T00:15:54.712Z cpu0:2098255)WARNING: iscsi_vmk: > iscsivmk_StartConnection:880: vmhba68:CH:0 T:0 CN:0: iSCSI connection is > being marked "ONLINE" > 2020-02-02T00:15:54.712Z cpu0:2098255)WARNING: iscsi_vmk: > iscsivmk_StartConnection:881: Sess [ISID: 00023d000001 TARGET: > iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw TPGT: 1 TSIH: 0] > 2020-02-02T00:15:54.712Z cpu0:2098255)WARNING: iscsi_vmk: > iscsivmk_StartConnection:882: Conn [CID: 0 L: 192.168.201.107:35264 R: > 192.168.201.1:3260] > 2020-02-02T00:15:54.712Z cpu30:2097595)ScsiDevice: 4481: Handle REPORTED > LUNS CHANGED DATA unit attention > 2020-02-02T00:15:54.712Z cpu30:2097595)ScsiDevice: 4512: Handle INQUIRY > PARAMETERS CHANGED unit attention > 2020-02-02T00:15:54.716Z cpu2:2097312)ScsiDevice: 6001: Setting Device > naa.6001405ba48e0b99e4c418ca13506c8e state back to 0x2 > 2020-02-02T00:15:54.716Z cpu2:2097312)ScsiDevice: 8708: No Handlers > registered! (naa.6001405ba48e0b99e4c418ca13506c8e)! > 2020-02-02T00:15:54.716Z cpu2:2097312)ScsiDevice: 6022: Device > naa.6001405ba48e0b99e4c418ca13506c8e is Out of APD; token num:1 > 2020-02-02T00:15:54.716Z cpu2:2097312)StorageApdHandler: 1315: APD exit > for 0x4305de04a9d0 [naa.6001405ba48e0b99e4c418ca13506c8e] > 2020-02-02T00:15:54.716Z cpu0:2097602)StorageApdHandler: 507: APD exit > event for 0x4305de04a9d0 [naa.6001405ba48e0b99e4c418ca13506c8e] > 2020-02-02T00:15:54.716Z cpu0:2097602)StorageApdHandlerEv: 117: Device or > filesystem with identifier [naa.6001405ba48e0b99e4c418ca13506c8e] has > exited the All Paths Down state. > 2020-02-02T00:15:54.717Z cpu11:2097618)ScsiDevice: 6001: Setting Device > naa.6001405cb71b882378c4138826c2da30 state back to 0x2 > 2020-02-02T00:15:54.717Z cpu11:2097618)ScsiDevice: 8708: No Handlers > registered! (naa.6001405cb71b882378c4138826c2da30)! > 2020-02-02T00:15:54.717Z cpu11:2097618)ScsiDevice: 6022: Device > naa.6001405cb71b882378c4138826c2da30 is Out of APD; token num:1 > 2020-02-02T00:15:54.717Z cpu11:2097618)StorageApdHandler: 1315: APD exit > for 0x4305de041660 [naa.6001405cb71b882378c4138826c2da30] > 2020-02-02T00:15:54.717Z cpu0:2097602)StorageApdHandler: 507: APD exit > event for 0x4305de041660 [naa.6001405cb71b882378c4138826c2da30] > 2020-02-02T00:15:54.717Z cpu0:2097602)StorageApdHandlerEv: 117: Device or > filesystem with identifier [naa.6001405cb71b882378c4138826c2da30] has > exited the All Paths Down state. > 2020-02-02T00:15:55.096Z cpu0:2098243)NMP: nmpCompleteRetryForPath:327: > Retry world recovered device "naa.6001405ba48e0b99e4c418ca13506c8e" > 2020-02-02T00:15:55.096Z cpu8:2099418 opID=17fe6cc5)World: 11943: VC opID > sps-Main-52618-962-fe-1-62ed maps to vmkernel opID 17fe6cc5 > 2020-02-02T00:15:55.096Z cpu8:2099418 opID=17fe6cc5)WARNING: ScsiDeviceIO: > 10750: READ CAPACITY on device "naa.6001405ba48e0b99e4c418ca13506c8e" from > Plugin "NMP" failed. I/O error > 2020-02-02T00:15:55.097Z cpu3:2098243)NMP: nmp_ThrottleLogForDevice:3788: > Cmd 0x28 (0x459a9abd2380, 0) to dev "naa.6001405ba48e0b99e4c418ca13506c8e" > on path "vmhba68:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 > 0x4 0xa. Act:FAILOVER > 2020-02-02T00:15:55.097Z cpu3:2098243)WARNING: NMP: > nmp_DeviceRetryCommand:133: Device "naa.6001405ba48e0b99e4c418ca13506c8e": > awaiting fast path state update for failover with I/O blocked. No prior > reservation exists on the device. > 2020-02-02T00:15:56.095Z cpu27:2098080)WARNING: NMP: > nmpDeviceAttemptFailover:640: Retry world failover device > "naa.6001405ba48e0b99e4c418ca13506c8e" - issuing command 0x459a9abd2380 > 2020-02-02T00:15:56.099Z cpu3:2098243)NMP: nmpCompleteRetryForPath:327: > Retry world recovered device "naa.6001405ba48e0b99e4c418ca13506c8e" > 2020-02-02T00:16:02.095Z cpu23:2097316)qfle3: > qfle3_queue_alloc_with_attr:517: [vmnic0] > QueueOps.qfle3_queue_alloc_with_attr num_attr :1 attrs: 0x451a4521bcd0 > 2020-02-02T00:16:02.095Z cpu23:2097316)qfle3: > qfle3_queue_alloc_with_attr:545: [vmnic0] Feature LRO requested. > 2020-02-02T00:16:02.095Z cpu23:2097316)qfle3: qfle3_rq_alloc:282: [vmnic0] > allocating RX queue at 1 > 2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_rq_alloc:316: [vmnic0] > Marking RX queue 1 IN_USE > 2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_queue_start:1529: > [vmnic0] QueueOps.queueStart > 2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_queue_start:1537: > [vmnic0] RxQ, QueueIDVal:1 > 2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_rx_queue_start:1452: > [vmnic0] qfle3_rx_queue_start, QueueIDVal:1 > 2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_rq_start:1287: > [vmnic0] qfle3_rq_start 1 > 2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_init_eth_fp:68: > [vmnic0] qfle3_init_eth_fp for fp 1 > 2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_rq_start:1334: > [vmnic0] RX fp[1]: wrote prods bd_prod=4078 cqe_prod=4030 sge_prod=1024 > 2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_rq_start:1351: > [vmnic0] enabled netpoll for q_index 1 > 2020-02-02T00:16:02.096Z cpu23:2097316)qfle3: qfle3_rq_start:1357: > [vmnic0] Enabling interrupt on vector # 3 > 2020-02-02T00:16:02.097Z cpu23:2097316)qfle3: qfle3_rq_start:1377: > [vmnic0] RX queue setup_queue successful for 1 > 2020-02-02T00:16:02.097Z cpu23:2097316)qfle3: qfle3_rq_start:1411: > [vmnic0] active Rx queue Count 2 > 2020-02-02T00:16:02.097Z cpu23:2097316)qfle3: qfle3_rq_start:1412: > [vmnic0] RX queue 1 successfully started > 2020-02-02T00:16:02.098Z cpu23:2097316)qfle3: > qfle3_queue_remove_filter:2063: [vmnic0] QueueOps.queueRemoveFilter > 2020-02-02T00:16:02.099Z cpu23:2097316)qfle3: > qfle3_remove_queue_filter:2012: [vmnic0] NetQ removed RX filter: queue:0 > mac: 00:50:56:6f:59:4f filter id:3 > 2020-02-02T00:16:02.099Z cpu23:2097316)qfle3: > qfle3_queue_apply_filter:1923: [vmnic0] QueueOps.queueApplyFilter 1 > 2020-02-02T00:16:02.101Z cpu23:2097316)qfle3: > qfle3_apply_queue_mac_filter:1798: [vmnic0] NetQ set RX filter: queue:1 > mac: 00:50:56:6f:59:4f filter id:0 > 2020-02-02T00:16:04.924Z cpu0:2099418 opID=cff7ff5)World: 11943: VC opID > sps-Main-52618-962-d0-d6-6321 maps to vmkernel opID cff7ff5 > 2020-02-02T00:16:04.924Z cpu0:2099418 opID=cff7ff5)Unmap6: 7133: [Unmap] > 'datastore1':device(0x43078fc9a1f0)does not support unmap > 2020-02-02T00:16:07.164Z cpu9:2271849)J6: 2651: 'Storage_Ceph_pool1': > Exiting async journal replay manager world > 2020-02-02T00:16:09.665Z cpu12:2271850)J6: 2651: 'Storage_Ceph_pool2': > Exiting async journal replay manager world > 2020-02-02T00:16:10.930Z cpu4:2271852)J6: 2651: 'datastore1': Exiting > async journal replay manager world > 2020-02-02T00:16:11.094Z cpu22:2271853)J6: 2651: 'SSD1': Exiting async > journal replay manager world > 2020-02-02T00:16:22.094Z cpu26:2097316)qfle3: > qfle3_queue_remove_filter:2063: [vmnic0] QueueOps.queueRemoveFilter > 2020-02-02T00:16:22.097Z cpu26:2097316)qfle3: > qfle3_remove_queue_filter:2012: [vmnic0] NetQ removed RX filter: queue:1 > mac: 00:50:56:6f:59:4f filter id:0 > 2020-02-02T00:16:22.098Z cpu26:2097316)qfle3: > qfle3_queue_apply_filter:1923: [vmnic0] QueueOps.queueApplyFilter 0 > 2020-02-02T00:16:22.099Z cpu26:2097316)qfle3: > qfle3_apply_queue_mac_filter:1798: [vmnic0] NetQ set RX filter: queue:0 > mac: 00:50:56:6f:59:4f filter id:3 > 2020-02-02T00:16:22.099Z cpu26:2097316)qfle3: qfle3_queue_quiesce:1061: > [vmnic0] QueueOps.queueQuiesce > 2020-02-02T00:16:22.099Z cpu26:2097316)qfle3: qfle3_queue_quiesce:1069: > [vmnic0] RxQ, QueueIDVal:1 > 2020-02-02T00:16:22.099Z cpu26:2097316)qfle3: qfle3_rx_queue_stop:1558: > [vmnic0] qfle3_rx_queue_stop, QueueIDVal:1 > 2020-02-02T00:16:22.099Z cpu26:2097316)qfle3: qfle3_rq_stop:740: [vmnic0] > qfle3_rq_stop 1 > 2020-02-02T00:16:22.099Z cpu26:2097316)qfle3: qfle3_rq_stop:811: [vmnic0] > Stopping queue 0 > 2020-02-02T00:16:22.102Z cpu26:2097316)qfle3: qfle3_rq_stop:831: [vmnic0] > disable netpoll for q_index 1 > 2020-02-02T00:16:22.102Z cpu26:2097316)qfle3: qfle3_rq_stop:842: [vmnic0] > Disabling interrupt on vector # 3 > 2020-02-02T00:16:22.102Z cpu26:2097316)qfle3: qfle3_rq_stop:867: [vmnic0] > active Rx queue Count 1 > 2020-02-02T00:16:22.102Z cpu26:2097316)qfle3: qfle3_queue_free:690: > [vmnic0] QueueOps.queueFree > 2020-02-02T00:16:22.102Z cpu26:2097316)qfle3: qfle3_queue_free:697: > [vmnic0] RxQ, QueueIDVal:1 > 2020-02-02T00:16:22.102Z cpu26:2097316)qfle3: qfle3_rq_free:618: [vmnic0] > Loop through 1 RSS queues 1 > 2020-02-02T00:16:22.102Z cpu26:2097316)qfle3: qfle3_cmd_remove_q:19191: > [vmnic0] Releasing Q idx 1 > > > 2020-02-02T00:17:05.925Z cpu4:2100126 opID=4a500647)World: 11943: VC opID > sps-Main-52618-962-3a-dd-633d maps to vmkernel opID 4a500647 > 2020-02-02T00:17:05.925Z cpu4:2100126 opID=4a500647)Unmap6: 7133: [Unmap] > 'datastore1':device(0x43078fc99b10)does not support unmap > 2020-02-02T00:17:11.930Z cpu15:2271855)J6: 2651: 'datastore1': Exiting > async journal replay manager world > 2020-02-02T00:17:12.100Z cpu34:2271856)J6: 2651: 'SSD1': Exiting async > journal replay manager world > > > > > >> >> > >> > Nothing is recorded in the logs. Can you help me? >> > >> > Em qui., 26 de dez. de 2019 às 17:44, Mike Christie < >> mchristi@xxxxxxxxxx >> > <mailto:mchristi@xxxxxxxxxx>> escreveu: >> > >> > On 12/24/2019 06:40 AM, Gesiel Galvão Bernardes wrote: >> > > In addition: I turned off one of the GWs, and with just one it >> works >> > > fine. When the two go up, one of the images is changing the >> "active / >> > > optimized" all time (where generates the logs above) and >> everything is >> > > extremely slow. >> > >> > Your multipathing in ESX is probably misconfigured and you have set >> it >> > up for active active, or one host can't see all the iscsi paths >> either >> > because it's not logged into all the sessions or because the >> network is >> > not up on one of the paths. >> > >> > >> > > >> > > I'm using: >> > > tcmu-runner-1.4 >> > > ceph-iscsi-3.3 >> > > ceph 13.2.7 >> > > >> > > Regards, >> > > Gesiel >> > > >> > > Em ter., 24 de dez. de 2019 às 09:09, Gesiel Galvão Bernardes >> > > <gesiel.bernardes@xxxxxxxxx <mailto:gesiel.bernardes@xxxxxxxxx> >> > <mailto:gesiel.bernardes@xxxxxxxxx >> > <mailto:gesiel.bernardes@xxxxxxxxx>>> escreveu: >> > > >> > > Hi, >> > > >> > > I am having an unusual slowdown using VMware with ISCSI gws. I >> > have >> > > two ISCSI gateways with two RBD images. I have checked the >> > following >> > > in the logs: >> > > >> > > Dec 24 09:00:26 ceph-iscsi2 tcmu-runner: 2019-12-24 >> > 09:00:26.040 969 >> > > [INFO] alua_implicit_transition:562 rbd/pool1.vmware_iscsi1: >> > > Starting lock acquisition operation.2019-12-24 09:00:26.040 >> 969 >> > > [INFO] alua_implicit_transition:557 rbd/pool1.vmware_iscsi1: >> Lock >> > > acquisition operation is already in process.2019-12-24 >> > 09:00:26.973 >> > > 969 [WARN] tcmu_rbd_lock:744 rbd/pool1.vmware_iscsi1: Acquired >> > > exclusive lock. >> > > Dec 24 09:00:26 ceph-iscsi2 tcmu-runner: tcmu_rbd_lock:744 >> > > rbd/pool1.vmware_iscsi1: Acquired exclusive lock. >> > > Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: 2019-12-24 >> > 09:00:28.099 969 >> > > [WARN] tcmu_notify_lock_lost:201 rbd/pool1.vmware_iscsi1: >> > Async lock >> > > drop. Old state 1 >> > > Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: >> tcmu_notify_lock_lost:201 >> > > rbd/pool1.vmware_iscsi1: Async lock drop. Old state 1 >> > > Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: >> > > alua_implicit_transition:562 rbd/pool1.vmware_iscsi1: Starting >> > lock >> > > acquisition operation. >> > > Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: 2019-12-24 >> > 09:00:28.824 969 >> > > [INFO] alua_implicit_transition:562 rbd/pool1.vmware_iscsi1: >> > > Starting lock acquisition operation.2019-12-24 09:00:28.990 >> 969 >> > > [WARN] tcmu_rbd_lock:744 rbd/pool1.vmware_iscsi1: Acquired >> > exclusive >> > > lock. >> > > Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: tcmu_rbd_lock:744 >> > > rbd/pool1.vmware_iscsi1: Acquired exclusive lock. >> > > >> > > >> > > Can anyone help-me please? >> > > >> > > Gesiel >> > > >> > > >> > > >> > > >> > > _______________________________________________ >> > > ceph-users mailing list -- ceph-users@xxxxxxx >> > <mailto:ceph-users@xxxxxxx> >> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > <mailto:ceph-users-leave@xxxxxxx> >> > > >> > >> >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx