Very odd performance issue

David Miller <dmiller@xxxxxxxxxxx> · Thu, 4 May 2017 14:48:38 -0400

Background:  4 identical gluster servers with 15 TB each in 2x2 setup.CentOS Linux release 7.3.1611 (Core)
glusterfs-server-3.9.1-1.el7.x86_64
client systems are using:
glusterfs-client	 3.5.2-2+deb8u3

The cluster has ~12 TB in use with 21 million files.  Lots of jpgs.  About 12 clients are mounting gluster volumes.  

Network load is light: iftop shows each server has 10-15 Mbit reads and about half that in writes.

What I’m seeing that concerns me is that one box, gluster4, has roughly twice the CPU utilization and twice or more the load average of the other three servers.  gluster4 has a 24 hour average of about 30% CPU utilization, something that seems to me to be way out of line for a couple MB/sec of traffic.

In running volume top, the odd thing I see is that for gluster1-3 I get latency summaries like this:
Brick: gluster1.publicinteractive.com:/gluster/drupal_prod
—————————————————————————————
%-latency  Avg-latency  Min-Latency  Max-Latency   No. of calls       Fop
 --------  -----------  -----------  -----------   ------------      ----

 9.96     675.07 us      15.00 us 1067793.00 us         205060     INODELK 
15.85    3414.20 us      16.00 us  773621.00 us          64494        READ
51.35    2235.96 us      12.00 us 1093609.00 us         319120      LOOKUP

… but my problem server has far more inodelk latency:

12.01    4712.03 us      17.00 us 1773590.00 us          47214        READ
27.50    2390.27 us      14.00 us 1877571.00 us         213121     INODELK
28.70    1643.65 us      12.00 us 1837696.00 us         323407      LOOKUP

The servers are intended to be identical, and are indeed identical hardware.

Suggestions on where to look or which FM to RT ver welcome indeed.

Thanks,

David

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users