What is the OS and it's version ? I have seen similar behaviour (different workload) on RHEL 7.6 (and below). Have you checked what processes are in 'R' or 'D' state on st2a ? Best Regards, Strahil Nikolov На 23 юни 2020 г. 19:31:12 GMT+03:00, Pavel Znamensky <kompastver@xxxxxxxxx> написа: >Hi all, >There's something strange with one of our clusters and glusterfs >version >6.8: it's quite slow and one node is overloaded. >This is distributed cluster with four servers with the same >specs/OS/versions: > >Volume Name: st2 >Type: Distributed-Replicate >Volume ID: 4755753b-37c4-403b-b1c8-93099bfc4c45 >Status: Started >Snapshot Count: 0 >Number of Bricks: 2 x 2 = 4 >Transport-type: tcp >Bricks: >Brick1: st2a:/vol3/st2 >Brick2: st2b:/vol3/st2 >Brick3: st2c:/vol3/st2 >Brick4: st2d:/vol3/st2 >Options Reconfigured: >cluster.rebal-throttle: aggressive >nfs.disable: on >performance.readdir-ahead: off >transport.address-family: inet6 >performance.quick-read: off >performance.cache-size: 1GB >performance.io-cache: on >performance.io-thread-count: 16 >cluster.data-self-heal-algorithm: full >network.ping-timeout: 20 >server.event-threads: 2 >client.event-threads: 2 >cluster.readdir-optimize: on >performance.read-ahead: off >performance.parallel-readdir: on >cluster.self-heal-daemon: enable >storage.health-check-timeout: 20 > >op.version for this cluster remains 50400 > >st2a is a replica for the st2b and st2c is a replica for st2d. >All our 50 clients mount this volume using FUSE and in contrast with >other >our cluster this one works terrible slow. >Interesting thing here is that there are very low HDDs and network >utilization from one hand and quite overloaded server from another >hand. >Also, there are no files which should be healed according to `gluster >volume heal st2 info`. >Load average across servers: >st2a: >load average: 28,73, 26,39, 27,44 >st2b: >load average: 0,24, 0,46, 0,76 >st2c: >load average: 0,13, 0,20, 0,27 >st2d: >load average:2,93, 2,11, 1,50 > >If we stop glusterfs on st2a server the cluster will work as fast as we >expected. >Previously the cluster worked on a version 5.x and there were no such >problems. > >Interestingly, that almost all CPU usage on st2a generates by a >"system" >load. >The most CPU intensive process is glusterfsd. >`top -H` for glusterfsd process shows this: > >PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >COMMAND > >13894 root 20 0 2172892 96488 9056 R 74,0 0,1 122:09.14 >glfs_iotwr00a >13888 root 20 0 2172892 96488 9056 R 73,7 0,1 121:38.26 >glfs_iotwr004 >13891 root 20 0 2172892 96488 9056 R 73,7 0,1 121:53.83 >glfs_iotwr007 >13920 root 20 0 2172892 96488 9056 R 73,0 0,1 122:11.27 >glfs_iotwr00f >13897 root 20 0 2172892 96488 9056 R 68,3 0,1 121:09.82 >glfs_iotwr00d >13896 root 20 0 2172892 96488 9056 R 68,0 0,1 122:03.99 >glfs_iotwr00c >13868 root 20 0 2172892 96488 9056 R 67,7 0,1 122:42.55 >glfs_iotwr000 >13889 root 20 0 2172892 96488 9056 R 67,3 0,1 122:17.02 >glfs_iotwr005 >13887 root 20 0 2172892 96488 9056 R 67,0 0,1 122:29.88 >glfs_iotwr003 >13885 root 20 0 2172892 96488 9056 R 65,0 0,1 122:04.85 >glfs_iotwr001 >13892 root 20 0 2172892 96488 9056 R 55,0 0,1 121:15.23 >glfs_iotwr008 >13890 root 20 0 2172892 96488 9056 R 54,7 0,1 121:27.88 >glfs_iotwr006 >13895 root 20 0 2172892 96488 9056 R 54,0 0,1 121:28.35 >glfs_iotwr00b >13893 root 20 0 2172892 96488 9056 R 53,0 0,1 122:23.12 >glfs_iotwr009 >13898 root 20 0 2172892 96488 9056 R 52,0 0,1 122:30.67 >glfs_iotwr00e >13886 root 20 0 2172892 96488 9056 R 41,3 0,1 121:26.97 >glfs_iotwr002 >13878 root 20 0 2172892 96488 9056 S 1,0 0,1 1:20.34 >glfs_rpcrqhnd >13840 root 20 0 2172892 96488 9056 S 0,7 0,1 0:51.54 >glfs_epoll000 >13841 root 20 0 2172892 96488 9056 S 0,7 0,1 0:51.14 >glfs_epoll001 >13877 root 20 0 2172892 96488 9056 S 0,3 0,1 1:20.02 >glfs_rpcrqhnd >13833 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.00 >glusterfsd >13834 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.14 >glfs_timer >13835 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.00 >glfs_sigwait >13836 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.16 >glfs_memsweep >13837 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.05 >glfs_sproc0 > >Also I didn't find relevant messages in log files. >Honestly, don't know what to do. Does someone know how to debug or fix >this >behaviour? > >Best regards, >Pavel ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users