here is the profile for about 30 seconds.. I didn't let it run a full 60:
Brick: media2-be:/gluster/brick1/gluster_volume_0
-------------------------------------------------
Cumulative Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 0 0 0
No. of Writes: 31133 37339 35573
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 0 0 0
No. of Writes: 284535 91431 43838
Block Size: 32768b+ 65536b+ 131072b+
No. of Reads: 0 0 181121
No. of Writes: 27764 22258 226187
Block Size: 262144b+
No. of Reads: 0
No. of Writes: 7
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 53 RELEASE
0.00 0.00 us 0.00 us 0.00 us 272 RELEASEDIR
0.08 217.02 us 17.00 us 46751.00 us 837 STAT
0.54 487.22 us 5.00 us 150634.00 us 2675 FINODELK
1.07 2591.53 us 24.00 us 186199.00 us 1001 READ
1.77 3224.61 us 16.00 us 113361.00 us 1322 WRITE
3.02 84.19 us 8.00 us 186102.00 us 86693 INODELK
5.10 11293.23 us 20.00 us 153002.00 us 1090 FXATTROP
88.42 395188.99 us 2771.00 us 2378742.00 us 540 FSYNC
Duration: 82547 seconds
Data Read: 23739891712 bytes
Data Written: 36058159104 bytes
Interval 1 Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 0 0 0
No. of Writes: 24 2 14
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 0 0 0
No. of Writes: 167 28 4
Block Size: 32768b+ 65536b+ 131072b+
No. of Reads: 0 0 309
No. of Writes: 8 1 0
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.13 61.58 us 17.00 us 2074.00 us 224 STAT
0.19 41.50 us 7.00 us 4143.00 us 498 FINODELK
0.53 188.37 us 27.00 us 11377.00 us 309 READ
3.04 1337.96 us 18.00 us 48765.00 us 248 WRITE
5.28 2594.87 us 21.00 us 47939.00 us 222 FXATTROP
14.58 20.41 us 8.00 us 47905.00 us 77937 INODELK
76.25 74945.14 us 23687.00 us 199942.00 us 111 FSYNC
Duration: 53 seconds
Data Read: 40501248 bytes
Data Written: 1512448 bytes
Brick: media1-be:/gluster/brick1/gluster_volume_0
-------------------------------------------------
Cumulative Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 0 0 0
No. of Writes: 2831 4699 6142
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 0 0 0
No. of Writes: 46751 16712 7972
Block Size: 32768b+ 65536b+ 131072b+
No. of Reads: 0 0 0
No. of Writes: 4462 2938 27952
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 7 RELEASE
0.00 0.00 us 0.00 us 0.00 us 11 RELEASEDIR
1.75 245.15 us 46.00 us 19886.00 us 1321 WRITE
6.99 1191.45 us 114.00 us 215838.00 us 1089 FXATTROP
10.07 698.36 us 18.00 us 286316.00 us 2674 FINODELK
24.51 52.44 us 23.00 us 171166.00 us 86694 INODELK
56.69 19472.96 us 1568.00 us 249274.00 us 540 FSYNC
Duration: 2224 seconds
Data Read: 0 bytes
Data Written: 4669031424 bytes
Interval 1 Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 0 0 0
No. of Writes: 24 2 14
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 0 0 0
No. of Writes: 167 28 4
Block Size: 32768b+ 65536b+
No. of Reads: 0 0
No. of Writes: 8 1
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
1.12 302.43 us 114.00 us 10132.00 us 222 FXATTROP
1.18 285.74 us 56.00 us 7058.00 us 248 WRITE
4.31 519.43 us 21.00 us 188427.00 us 498 FINODELK
32.76 17714.02 us 5018.00 us 205904.00 us 111 FSYNC
60.63 46.69 us 23.00 us 9550.00 us 77936 INODELK
Duration: 53 seconds
Data Read: 0 bytes
Data Written: 1512448 bytes
[root@media2 ~]# gluster volume status
Status of volume: gvol0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick media1-be:/gluster/brick1/gluster_vol
ume_0 49152 0 Y 2829
Brick media2-be:/gluster/brick1/gluster_vol
ume_0 49152 0 Y 1456
NFS Server on localhost N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 1451
NFS Server on media1 N/A N/A N N/A
Self-heal Daemon on media1 N/A N/A Y 2824
Task Status of Volume gvol0
------------------------------------------------------------------------------
There are no active volume tasks
On Thu, Oct 6, 2016 at 4:25 PM, Michael Ciccarelli <mikecicc01@xxxxxxxxx> wrote:
this is the info file contents.. is there another file you would want to see for config?type=2count=2status=1sub_count=2stripe_count=1replica_count=2disperse_count=0redundancy_count=0version=3transport-type=0volume-id=98c258e6-ae9e-4407-8f25-7e3f7700e100 username=removed just causepassword=removed just causeop-version=3client-op-version=3quota-version=0parent_volname=N/Arestored_from_snap=00000000-0000-0000-0000-000000000000 snap-max-hard-limit=256diagnostics.count-fop-hits=ondiagnostics.latency-measurement=on performance.readdir-ahead=onbrick-0=media1-be:-gluster-brick1-gluster_volume_0 brick-1=media2-be:-gluster-brick1-gluster_volume_0 here are some log entries, etc-glusterfs-glusterd.vol.log: The message "I [MSGID: 106006] [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_ notify] 0-management: nfs has disconnected from glusterd." repeated 39 times between [2016-10-06 20:10:14.963402] and [2016-10-06 20:12:11.979684] [2016-10-06 20:12:14.980203] I [MSGID: 106006] [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_ notify] 0-management: nfs has disconnected from glusterd. [2016-10-06 20:13:50.993490] W [socket.c:596:__socket_rwv] 0-nfs: readv on /var/run/gluster/360710d59bc4799f8c8a6374936d2b 1b.socket failed (Invalid argument) I can provide any specific details you would like to see.. Last night I tried 1 more time and it appeared to be working ok for running 1 VM under VMware but as soon as I had 3 running the targets became unresponsive. I believe gluster volume is ok but for whatever reason the ISCSI target daemon seems to be having some issues...here is from the messages file:Oct 5 23:13:00 media2 kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02Oct 5 23:13:00 media2 kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02Oct 5 23:13:35 media2 kernel: iSCSI/iqn.1998-01.com.vmware:vmware4-0941d552: Unsupported SCSI Opcode 0x4d, sending CHECK_CONDITION. Oct 5 23:13:35 media2 kernel: iSCSI/iqn.1998-01.com.vmware:vmware4-0941d552: Unsupported SCSI Opcode 0x4d, sending CHECK_CONDITION. and here are some more VMware iscsi errors:2016-10-06T20:22:11.496Z cpu2:32825)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x89 (0x412e808532c0, 32801) to dev "naa.6001405c0d86944f3d2468d80c7d15 40" on 2016-10-06T20:22:11.635Z cpu2:32787)ScsiDeviceIO: 2338: Cmd(0x412e808532c0) 0x89, CmdSN 0x4f05 from world 32801 to dev "naa.6001405c0d86944f3d2468d80c7d1 2016-10-06T20:22:11.635Z cpu3:35532)Fil3: 15389: Max timeout retries exceeded for caller Fil3_FileIO (status 'Timeout')2016-10-06T20:22:11.635Z cpu2:196414)HBX: 2832: Waiting for timed out [HB state abcdef02 offset 3928064 gen 25 stampUS 49571997650 uuid 57f5c142-45632d752016-10-06T20:22:11.635Z cpu3:35532)HBX: 2832: Waiting for timed out [HB state abcdef02 offset 3928064 gen 25 stampUS 49571997650 uuid 57f5c142-45632d75-2016-10-06T20:22:11.635Z cpu0:32799)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d15 40" on 2016-10-06T20:22:11.635Z cpu0:32799)ScsiDeviceIO: 2325: Cmd(0x412e80848580) 0x28, CmdSN 0x4f06 from world 32799 to dev "naa.6001405c0d86944f3d2468d80c7d1 2016-10-06T20:22:11.773Z cpu0:32843)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d15 40" on 2016-10-06T20:22:11.916Z cpu0:35549)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d15 40" on 2016-10-06T20:22:12.000Z cpu2:33431)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x410987bf0800 network resource pool netsched.pools.persist.iscsi associa2016-10-06T20:22:12.000Z cpu2:33431)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x410987bf0800 network tracker id 16 tracker.iSCSI.172.16.1.40 associated2016-10-06T20:22:12.056Z cpu0:35549)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d15 40" on 2016-10-06T20:22:12.194Z cpu0:35549)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d15 40" on 2016-10-06T20:22:12.253Z cpu2:33431)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba38:CH:1 T:1 CN:0: iSCSI connection is being marked "ONLINE"2016-10-06T20:22:12.253Z cpu2:33431)WARNING: iscsi_vmk: iscsivmk_StartConnection: Sess [ISID: 00023d000004 TARGET: iqn.2016-09.iscsi.gluster:shared TPGT: 2016-10-06T20:22:12.253Z cpu2:33431)WARNING: iscsi_vmk: iscsivmk_StartConnection: Conn [CID: 0 L: 172.16.1.53:49959 R: 172.16.1.40:3260]Is it that the gluster overhead is just killing LIO/target?thanks,MikeOn Thu, Oct 6, 2016 at 12:22 PM, Vijay Bellur <vbellur@xxxxxxxxxx> wrote:Hi Mike,
Can you please share your gluster volume configuration?
Also do you notice anything in client logs on the node where fileio
backstore is configured?
Thanks,
Vijay
> ______________________________
On Wed, Oct 5, 2016 at 8:56 PM, Michael Ciccarelli <mikecicc01@xxxxxxxxx> wrote:
> So I have a fairly basic setup using glusterfs between 2 nodes. The nodes
> have 10 gig connections and the bricks reside on SSD LVM LUNs:
>
> Brick1: media1-be:/gluster/brick1/gluster_volume_0
> Brick2: media2-be:/gluster/brick1/gluster_volume_0
>
>
> On this volume I have a LIO iscsi target with 1 fileio backstore that's
> being shared out to vmware ESXi hosts. The volume is around 900 gig and the
> fileio store is around 850g:
>
> -rw-r--r-- 1 root root 912680550400 Oct 5 20:47 iscsi.disk.3
>
> I set the WWN to be the same so the ESXi hosts see the nodes as 2 paths to
> the same target. I believe this is what I want. The issues I'm seeing is
> that while the IO wait is low I'm seeing high CPU usage with only 3 VMs
> running on only 1 of the ESX servers:
>
> this is media2-be:
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 1474 root 20 0 1396620 37912 5980 S 135.0 0.1 157:01.84
> glusterfsd
> 1469 root 20 0 747996 13724 5424 S 2.0 0.0 1:10.59
> glusterfs
>
> And this morning it seemed like I had to restart the LIO service on
> media1-be as the VMware was seeing time-out issues. I'm seeing issues like
> this on the VMware ESX servers:
>
> 2016-10-06T00:51:41.100Z cpu0:32785)WARNING: ScsiDeviceIO: 1223: Device
> naa.600140501ce79002e724ebdb66a6756d performance has deteriorated. I/O
> latency increased from average value of 33420 microseconds to 732696
> microseconds.
>
> Are there any special settings I need to have gluster+LIO+vmware to work?
> Has anyone gotten this to work fairly well that it is stable? What am I
> missing?
>
> thanks,
> Mike
>
>
>
_________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users