On Sun, Aug 5, 2018 at 1:29 PM, Yuhao Zhang <zzyzxd@xxxxxxxxx> wrote:
Sorry, what I meant was, if I start the transfer now and get glusterd into zombie status, it's unlikely that I can fully recover the server without a reboot.
I missed it. Thanks for the explanation :).
On Aug 5, 2018, at 02:55, Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:On Sun, Aug 5, 2018 at 1:22 PM, Yuhao Zhang <zzyzxd@xxxxxxxxx> wrote:This is a semi-production server and I can't bring it down right now. Will try to get the monitoring output when I get a chance.Collecting top output doesn't require to bring down servers.As I recall, the high CPU processes are brick daemons (glusterfsd) and htop showed they were in status D. However, I saw zero zpool IO as clients were all hanging.On Aug 5, 2018, at 02:38, Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:On Sun, Aug 5, 2018 at 12:44 PM, Yuhao Zhang <zzyzxd@xxxxxxxxx> wrote: Hi,
I am running into a situation that heavy write causes Gluster server went into zombie with many high CPU processes and all clients hangs, it is almost 100% reproducible on my machine. Hope someone can help.Can you give us the output of monitioring these processes with High cpu usage captured in the duration when your tests are running?MON_INTERVAL=10 # can be increased for very long runs top -bd $MON_INTERVAL > /tmp/top_proc.${HOSTNAME}.txt # CPU utilization by process top -bHd $MON_INTERVAL > /tmp/top_thr.${HOSTNAME}.txt # CPU utilization by thread
I started to observe this issue when running rsync to copy files from another server and I thought it might be because Gluster doesn't like rsync's delta transfer with a lot of small writes. However, I was able to reproduce this with "rsync --whole-file --inplace", or even with cp or scp. It usually appears after starting the transfer for a few hours, but sometimes can happen within several minutes.
Since this is a single node Gluster distributed volume, I tried to transfer files directly onto the server bypassing Gluster clients, but it still caused the same issue.
It is running on top of a ZFS RAIDZ2 dataset. Options are attached. Also, I attached the statedump generated when my clients hung, and volume options.
- Ubuntu 16.04 x86_64 / 4.4.0-116-generic
- GlusterFS 3.12.8
Thank you,
Yuhao
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users