One of cluster work super slow (v6.8)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,
There's something strange with one of our clusters and glusterfs version 6.8: it's quite slow and one node is overloaded.
This is distributed cluster with four servers with the same specs/OS/versions:

Volume Name: st2
Type: Distributed-Replicate
Volume ID: 4755753b-37c4-403b-b1c8-93099bfc4c45
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: st2a:/vol3/st2
Brick2: st2b:/vol3/st2
Brick3: st2c:/vol3/st2
Brick4: st2d:/vol3/st2
Options Reconfigured:
cluster.rebal-throttle: aggressive
nfs.disable: on
performance.readdir-ahead: off
transport.address-family: inet6
performance.quick-read: off
performance.cache-size: 1GB
performance.io-cache: on
performance.io-thread-count: 16
cluster.data-self-heal-algorithm: full
network.ping-timeout: 20
server.event-threads: 2
client.event-threads: 2
cluster.readdir-optimize: on
performance.read-ahead: off
performance.parallel-readdir: on
cluster.self-heal-daemon: enable
storage.health-check-timeout: 20

op.version for this cluster remains 50400

st2a is a replica for the st2b and st2c is a replica for st2d.
All our 50 clients mount this volume using FUSE and in contrast with other our cluster this one works terrible slow.
Interesting thing here is that there are very low HDDs and network utilization from one hand and quite overloaded server from another hand.
Also, there are no files which should be healed according to `gluster volume heal st2 info`.
Load average across servers:
st2a:
load average: 28,73, 26,39, 27,44
st2b:
load average: 0,24, 0,46, 0,76
st2c:
load average: 0,13, 0,20, 0,27
st2d:
load average:2,93, 2,11, 1,50

If we stop glusterfs on st2a server the cluster will work as fast as we expected.
Previously the cluster worked on a version 5.x and there were no such problems.

Interestingly, that almost all CPU usage on st2a generates by a "system" load.
The most CPU intensive process is glusterfsd.
`top -H` for glusterfsd process shows this:

PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                        
13894 root      20   0 2172892  96488   9056 R 74,0  0,1 122:09.14 glfs_iotwr00a
13888 root      20   0 2172892  96488   9056 R 73,7  0,1 121:38.26 glfs_iotwr004
13891 root      20   0 2172892  96488   9056 R 73,7  0,1 121:53.83 glfs_iotwr007
13920 root      20   0 2172892  96488   9056 R 73,0  0,1 122:11.27 glfs_iotwr00f
13897 root      20   0 2172892  96488   9056 R 68,3  0,1 121:09.82 glfs_iotwr00d
13896 root      20   0 2172892  96488   9056 R 68,0  0,1 122:03.99 glfs_iotwr00c
13868 root      20   0 2172892  96488   9056 R 67,7  0,1 122:42.55 glfs_iotwr000
13889 root      20   0 2172892  96488   9056 R 67,3  0,1 122:17.02 glfs_iotwr005
13887 root      20   0 2172892  96488   9056 R 67,0  0,1 122:29.88 glfs_iotwr003
13885 root      20   0 2172892  96488   9056 R 65,0  0,1 122:04.85 glfs_iotwr001
13892 root      20   0 2172892  96488   9056 R 55,0  0,1 121:15.23 glfs_iotwr008
13890 root      20   0 2172892  96488   9056 R 54,7  0,1 121:27.88 glfs_iotwr006
13895 root      20   0 2172892  96488   9056 R 54,0  0,1 121:28.35 glfs_iotwr00b
13893 root      20   0 2172892  96488   9056 R 53,0  0,1 122:23.12 glfs_iotwr009
13898 root      20   0 2172892  96488   9056 R 52,0  0,1 122:30.67 glfs_iotwr00e
13886 root      20   0 2172892  96488   9056 R 41,3  0,1 121:26.97 glfs_iotwr002
13878 root      20   0 2172892  96488   9056 S  1,0  0,1   1:20.34 glfs_rpcrqhnd
13840 root      20   0 2172892  96488   9056 S  0,7  0,1   0:51.54 glfs_epoll000
13841 root      20   0 2172892  96488   9056 S  0,7  0,1   0:51.14 glfs_epoll001
13877 root      20   0 2172892  96488   9056 S  0,3  0,1   1:20.02 glfs_rpcrqhnd
13833 root      20   0 2172892  96488   9056 S  0,0  0,1   0:00.00 glusterfsd    
13834 root      20   0 2172892  96488   9056 S  0,0  0,1   0:00.14 glfs_timer    
13835 root      20   0 2172892  96488   9056 S  0,0  0,1   0:00.00 glfs_sigwait  
13836 root      20   0 2172892  96488   9056 S  0,0  0,1   0:00.16 glfs_memsweep
13837 root      20   0 2172892  96488   9056 S  0,0  0,1   0:00.05 glfs_sproc0      


Also I didn't find relevant messages in log files.
Honestly, don't know what to do. Does someone know how to debug or fix this behaviour?

Best regards,
Pavel


________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux