Re: How to find out what GlusterFS is doing

Yaniv Kaul <ykaul@xxxxxxxxxx> · Thu, 5 Nov 2020 16:28:24 +0200

On Thu, Nov 5, 2020 at 4:18 PM mabi <mabi@xxxxxxxxxxxxx> wrote:
Below is the top output of running "top -bHd d" on one of the nodes, maybe that can help to see what that glusterfsd process is doing?

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND

 4375 root      20   0 2856784 120492   8360 D 61.1  0.4 117:09.29 glfs_iotwr001

Waiting for IO, just like the rest of those in D state. 
You may have a slow storage subsystem. How many cores do you have, btw?
Y.

 4385 root      20   0 2856784 120492   8360 R 61.1  0.4 117:12.92 glfs_iotwr003

 4387 root      20   0 2856784 120492   8360 R 61.1  0.4 117:32.19 glfs_iotwr005

 4388 root      20   0 2856784 120492   8360 R 61.1  0.4 117:28.87 glfs_iotwr006

 4391 root      20   0 2856784 120492   8360 D 61.1  0.4 117:20.71 glfs_iotwr008

 4395 root      20   0 2856784 120492   8360 D 61.1  0.4 117:17.22 glfs_iotwr009

 4405 root      20   0 2856784 120492   8360 R 61.1  0.4 117:19.52 glfs_iotwr00d

 4406 root      20   0 2856784 120492   8360 R 61.1  0.4 117:29.51 glfs_iotwr00e

 4366 root      20   0 2856784 120492   8360 D 55.6  0.4 117:27.58 glfs_iotwr000

 4386 root      20   0 2856784 120492   8360 D 55.6  0.4 117:22.77 glfs_iotwr004

 4390 root      20   0 2856784 120492   8360 D 55.6  0.4 117:26.49 glfs_iotwr007

 4396 root      20   0 2856784 120492   8360 R 55.6  0.4 117:23.68 glfs_iotwr00a

 4376 root      20   0 2856784 120492   8360 D 50.0  0.4 117:36.17 glfs_iotwr002

 4397 root      20   0 2856784 120492   8360 D 50.0  0.4 117:11.09 glfs_iotwr00b

 4403 root      20   0 2856784 120492   8360 R 50.0  0.4 117:26.34 glfs_iotwr00c

 4408 root      20   0 2856784 120492   8360 D 50.0  0.4 117:27.47 glfs_iotwr00f

 9814 root      20   0 2043684  75208   8424 D 22.2  0.2  50:15.20 glfs_iotwr003

28131 root      20   0 2043684  75208   8424 R 22.2  0.2  50:07.46 glfs_iotwr004

 2208 root      20   0 2043684  75208   8424 R 22.2  0.2  49:32.70 glfs_iotwr008

 2372 root      20   0 2043684  75208   8424 R 22.2  0.2  49:52.60 glfs_iotwr009

 2375 root      20   0 2043684  75208   8424 D 22.2  0.2  49:54.08 glfs_iotwr00c

  767 root      39  19       0      0      0 R 16.7  0.0  67:50.83 dbuf_evict

 4132 onadmin   20   0   45292   4184   3176 R 16.7  0.0   0:00.04 top

28484 root      20   0 2043684  75208   8424 R 11.1  0.2  49:41.34 glfs_iotwr005

 2376 root      20   0 2043684  75208   8424 R 11.1  0.2  49:49.49 glfs_iotwr00d

 2719 root      20   0 2043684  75208   8424 R 11.1  0.2  49:58.61 glfs_iotwr00e

 4384 root      20   0 2856784 120492   8360 S  5.6  0.4   4:01.27 glfs_rpcrqhnd

 3842 root      20   0 2043684  75208   8424 S  5.6  0.2   0:30.12 glfs_epoll001

    1 root      20   0   57696   7340   5248 S  0.0  0.0   0:03.59 systemd

    2 root      20   0       0      0      0 S  0.0  0.0   0:09.57 kthreadd

    3 root      20   0       0      0      0 S  0.0  0.0   0:00.16 ksoftirqd/0

    5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H

    7 root      20   0       0      0      0 S  0.0  0.0   0:07.36 rcu_sched

    8 root      20   0       0      0      0 S  0.0  0.0   0:00.00 rcu_bh

    9 root      rt   0       0      0      0 S  0.0  0.0   0:00.03 migration/0

   10 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 lru-add-drain

   11 root      rt   0       0      0      0 S  0.0  0.0   0:00.01 watchdog/0

   12 root      20   0       0      0      0 S  0.0  0.0   0:00.00 cpuhp/0

   13 root      20   0       0      0      0 S  0.0  0.0   0:00.00 cpuhp/1

Any clues anyone?

The load is really high around 20 now on the two nodes...

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, November 5, 2020 11:50 AM, mabi <mabi@xxxxxxxxxxxxx> wrote:

> Hello,

>

> I have a 3 node replica including arbiter GlusterFS 7.8 server with 3 volumes and the two nodes (not arbiter) seem to have a high load due to the glusterfsd brick process taking all CPU resources (12 cores).

>

> Checking these two servers with iostat command shows that the disks are not so busy and that they are mostly doing writes activity. On the FUSE clients there is not so much activity so I was wondering how to find out or explain why GlusterFS is currently generating such a high load on these two servers (the arbiter does not show any high load). There are no files currently healing either. This volume is the only volume which has the quota enabled if this might be a hint. So does anyone know how to see why GlusterFS is so busy on a specific volume?

>

> Here is a sample "vmstat 60" of one of the nodes:

>

> onadmin@gfs1b:~$ vmstat 60

> procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

> r b swpd free buff cache si so bi bo in cs us sy id wa st

> 9 2 0 22296776 32004 260284 0 0 33 301 153 39 2 60 36 2 0

> 13 0 0 22244540 32048 260456 0 0 343 2798 10898 367652 2 80 16 1 0

> 18 0 0 22215740 32056 260672 0 0 308 2524 9892 334537 2 83 14 1 0

> 18 0 0 22179348 32084 260828 0 0 169 2038 8703 250351 1 88 10 0 0

>

> I already tried rebooting but that did not help and there is nothing special in the log files either.

>

> Best regards,

> Mabi

________

Community Meeting Calendar:

Schedule -

Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC

Bridge: https://meet.google.com/cpu-eiue-hvk

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-users

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users