Re: Gluster High CPU/Clients Hanging on Heavy Writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Sun, 5 Aug 2018 at 13:29, Yuhao Zhang <zzyzxd@xxxxxxxxx> wrote:
Sorry, what I meant was, if I start the transfer now and get glusterd into zombie status,

glusterd or glusterfsd?

it's unlikely that I can fully recover the server without a reboot.


On Aug 5, 2018, at 02:55, Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:



On Sun, Aug 5, 2018 at 1:22 PM, Yuhao Zhang <zzyzxd@xxxxxxxxx> wrote:
This is a semi-production server and I can't bring it down right now. Will try to get the monitoring output when I get a chance. 

Collecting top output doesn't require to bring down servers.


As I recall, the high CPU processes are brick daemons (glusterfsd) and htop showed they were in status D. However, I saw zero zpool IO as clients were all hanging.


On Aug 5, 2018, at 02:38, Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:



On Sun, Aug 5, 2018 at 12:44 PM, Yuhao Zhang <zzyzxd@xxxxxxxxx> wrote:
Hi,

I am running into a situation that heavy write causes Gluster server went into zombie with many high CPU processes and all clients hangs, it is almost 100% reproducible on my machine. Hope someone can help.

Can you give us the output of monitioring these processes with High cpu usage captured in the duration when your tests are running?

  • MON_INTERVAL=10 # can be increased for very long runs
  • top -bd $MON_INTERVAL > /tmp/top_proc.${HOSTNAME}.txt # CPU utilization by process
  • top -bHd $MON_INTERVAL > /tmp/top_thr.${HOSTNAME}.txt # CPU utilization by thread


I started to observe this issue when running rsync to copy files from another server and I thought it might be because Gluster doesn't like rsync's delta transfer with a lot of small writes. However, I was able to reproduce this with "rsync --whole-file --inplace", or even with cp or scp. It usually appears after starting the transfer for a few hours, but sometimes can happen within several minutes.

Since this is a single node Gluster distributed volume, I tried to transfer files directly onto the server bypassing Gluster clients, but it still caused the same issue.

It is running on top of a ZFS RAIDZ2 dataset. Options are attached. Also, I attached the statedump generated when my clients hung, and volume options.

- Ubuntu 16.04 x86_64 / 4.4.0-116-generic
- GlusterFS 3.12.8

Thank you,
Yuhao


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
--
--Atin
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux