this is the info file contents.. is there another file you would want to see for config?
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=3
transport-type=0
volume-id=98c258e6-ae9e-4407-8f25-7e3f7700e100
username=removed just cause
password=removed just cause
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
diagnostics.count-fop-hits=on
diagnostics.latency-measurement=on
performance.readdir-ahead=on
brick-0=media1-be:-gluster-brick1-gluster_volume_0
brick-1=media2-be:-gluster-brick1-gluster_volume_0
here are some log entries, etc-glusterfs-glusterd.vol.log:
The message "I [MSGID: 106006] [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd." repeated 39 times between [2016-10-06 20:10:14.963402] and [2016-10-06 20:12:11.979684]
[2016-10-06 20:12:14.980203] I [MSGID: 106006] [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd.
[2016-10-06 20:13:50.993490] W [socket.c:596:__socket_rwv] 0-nfs: readv on /var/run/gluster/360710d59bc4799f8c8a6374936d2b1b.socket failed (Invalid argument)
I can provide any specific details you would like to see.. Last night I tried 1 more time and it appeared to be working ok for running 1 VM under VMware but as soon as I had 3 running the targets became unresponsive. I believe gluster volume is ok but for whatever reason the ISCSI target daemon seems to be having some issues...
here is from the messages file:
Oct 5 23:13:00 media2 kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
Oct 5 23:13:00 media2 kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
Oct 5 23:13:35 media2 kernel: iSCSI/iqn.1998-01.com.vmware:vmware4-0941d552: Unsupported SCSI Opcode 0x4d, sending CHECK_CONDITION.
Oct 5 23:13:35 media2 kernel: iSCSI/iqn.1998-01.com.vmware:vmware4-0941d552: Unsupported SCSI Opcode 0x4d, sending CHECK_CONDITION.
and here are some more VMware iscsi errors:
2016-10-06T20:22:11.496Z cpu2:32825)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x89 (0x412e808532c0, 32801) to dev "naa.6001405c0d86944f3d2468d80c7d1540" on
2016-10-06T20:22:11.635Z cpu2:32787)ScsiDeviceIO: 2338: Cmd(0x412e808532c0) 0x89, CmdSN 0x4f05 from world 32801 to dev "naa.6001405c0d86944f3d2468d80c7d1
2016-10-06T20:22:11.635Z cpu3:35532)Fil3: 15389: Max timeout retries exceeded for caller Fil3_FileIO (status 'Timeout')
2016-10-06T20:22:11.635Z cpu2:196414)HBX: 2832: Waiting for timed out [HB state abcdef02 offset 3928064 gen 25 stampUS 49571997650 uuid 57f5c142-45632d75
2016-10-06T20:22:11.635Z cpu3:35532)HBX: 2832: Waiting for timed out [HB state abcdef02 offset 3928064 gen 25 stampUS 49571997650 uuid 57f5c142-45632d75-
2016-10-06T20:22:11.635Z cpu0:32799)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d1540" on
2016-10-06T20:22:11.635Z cpu0:32799)ScsiDeviceIO: 2325: Cmd(0x412e80848580) 0x28, CmdSN 0x4f06 from world 32799 to dev "naa.6001405c0d86944f3d2468d80c7d1
2016-10-06T20:22:11.773Z cpu0:32843)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d1540" on
2016-10-06T20:22:11.916Z cpu0:35549)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d1540" on
2016-10-06T20:22:12.000Z cpu2:33431)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x410987bf0800 network resource pool netsched.pools.persist.iscsi associa
2016-10-06T20:22:12.000Z cpu2:33431)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x410987bf0800 network tracker id 16 tracker.iSCSI.172.16.1.40 associated
2016-10-06T20:22:12.056Z cpu0:35549)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d1540" on
2016-10-06T20:22:12.194Z cpu0:35549)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d1540" on
2016-10-06T20:22:12.253Z cpu2:33431)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba38:CH:1 T:1 CN:0: iSCSI connection is being marked "ONLINE"
2016-10-06T20:22:12.253Z cpu2:33431)WARNING: iscsi_vmk: iscsivmk_StartConnection: Sess [ISID: 00023d000004 TARGET: iqn.2016-09.iscsi.gluster:shared TPGT:
2016-10-06T20:22:12.253Z cpu2:33431)WARNING: iscsi_vmk: iscsivmk_StartConnection: Conn [CID: 0 L: 172.16.1.53:49959 R: 172.16.1.40:3260]
Is it that the gluster overhead is just killing LIO/target?
thanks,
Mike
On Thu, Oct 6, 2016 at 12:22 PM, Vijay Bellur <vbellur@xxxxxxxxxx> wrote:
Hi Mike,
Can you please share your gluster volume configuration?
Also do you notice anything in client logs on the node where fileio
backstore is configured?
Thanks,
Vijay
> ______________________________
On Wed, Oct 5, 2016 at 8:56 PM, Michael Ciccarelli <mikecicc01@xxxxxxxxx> wrote:
> So I have a fairly basic setup using glusterfs between 2 nodes. The nodes
> have 10 gig connections and the bricks reside on SSD LVM LUNs:
>
> Brick1: media1-be:/gluster/brick1/gluster_volume_0
> Brick2: media2-be:/gluster/brick1/gluster_volume_0
>
>
> On this volume I have a LIO iscsi target with 1 fileio backstore that's
> being shared out to vmware ESXi hosts. The volume is around 900 gig and the
> fileio store is around 850g:
>
> -rw-r--r-- 1 root root 912680550400 Oct 5 20:47 iscsi.disk.3
>
> I set the WWN to be the same so the ESXi hosts see the nodes as 2 paths to
> the same target. I believe this is what I want. The issues I'm seeing is
> that while the IO wait is low I'm seeing high CPU usage with only 3 VMs
> running on only 1 of the ESX servers:
>
> this is media2-be:
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 1474 root 20 0 1396620 37912 5980 S 135.0 0.1 157:01.84
> glusterfsd
> 1469 root 20 0 747996 13724 5424 S 2.0 0.0 1:10.59
> glusterfs
>
> And this morning it seemed like I had to restart the LIO service on
> media1-be as the VMware was seeing time-out issues. I'm seeing issues like
> this on the VMware ESX servers:
>
> 2016-10-06T00:51:41.100Z cpu0:32785)WARNING: ScsiDeviceIO: 1223: Device
> naa.600140501ce79002e724ebdb66a675 6d performance has deteriorated. I/O
> latency increased from average value of 33420 microseconds to 732696
> microseconds.
>
> Are there any special settings I need to have gluster+LIO+vmware to work?
> Has anyone gotten this to work fairly well that it is stable? What am I
> missing?
>
> thanks,
> Mike
>
>
>
_________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users