gluster and LIO, fairly basic setup, having major issues

Michael Ciccarelli <mikecicc01@xxxxxxxxx> · Wed, 5 Oct 2016 20:56:36 -0400

So I have a fairly basic setup using glusterfs between 2 nodes. The nodes have 10 gig connections and the bricks reside on SSD LVM LUNs:
Brick1: media1-be:/gluster/brick1/gluster_volume_0
Brick2: media2-be:/gluster/brick1/gluster_volume_0

On this volume I have a LIO iscsi target with 1 fileio backstore that's being shared out to vmware ESXi hosts. The volume is around 900 gig and the fileio store is around 850g:

-rw-r--r-- 1 root root 912680550400 Oct  5 20:47 iscsi.disk.3

I set the WWN to be the same so the ESXi hosts see the nodes as 2 paths to the same target. I believe this is what I want. The issues I'm seeing is that while the IO wait is low I'm seeing high CPU usage with only 3 VMs running on only 1 of the ESX servers:

this is media2-be:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                
 1474 root      20   0 1396620  37912   5980 S 135.0  0.1 157:01.84 glusterfsd                                             
 1469 root      20   0  747996  13724   5424 S   2.0  0.0   1:10.59 glusterfs                                              

And this morning it seemed like I had to restart the LIO service on media1-be as the VMware was seeing time-out issues. I'm seeing issues like this on the VMware ESX servers:

2016-10-06T00:51:41.100Z cpu0:32785)WARNING: ScsiDeviceIO: 1223: Device naa.600140501ce79002e724ebdb66a6756d performance has deteriorated. I/O latency increased from average value of 33420 microseconds to 732696 microseconds.

Are there any special settings I need to have gluster+LIO+vmware to work? Has anyone gotten this to work fairly well that it is stable? What am I missing?

thanks,
Mike

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users