Hi Susant,
You are right, the rebalance process itself is normal now. But the writing brick keeps increasing during rebalancing. Current task has been running for 16 hours, here is the top info.
===================== top ===========================
top - 08:58:27 up 3 days, 12:08, 1 user, load average: 1.33, 1.18, 1.21
Tasks: 173 total, 1 running, 172 sleeping, 0 stopped, 0 zombie
Cpu(s): 13.0%us, 16.9%sy, 0.0%ni, 65.7%id, 2.7%wa, 0.0%hi, 1.8%si, 0.0%st
Mem: 8060900k total, 7923204k used, 137696k free, 4528380k buffers
Swap: 0k total, 0k used, 0k free, 393444k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8555 root 20 0 950m 143m 1728 S 154.7 1.8 875:01.07 glusterfs
8479 root 20 0 1284m 139m 1892 S 69.8 1.8 443:25.88 glusterfsd
8497 root 20 0 2628m 1.8g 1892 S 68.2 23.0 485:31.42 glusterfsd
874 root 20 0 0 0 0 S 2.3 0.0 65:34.68 jbd2/vdb1-8
58 root 20 0 0 0 0 S 0.7 0.0 44:44.37 kblockd/0
99 root 20 0 0 0 0 S 0.7 0.0 39:17.63 kswapd0
39 root 20 0 0 0 0 S 0.3 0.0 0:16.90 events/4
Tasks: 173 total, 1 running, 172 sleeping, 0 stopped, 0 zombie
Cpu(s): 13.0%us, 16.9%sy, 0.0%ni, 65.7%id, 2.7%wa, 0.0%hi, 1.8%si, 0.0%st
Mem: 8060900k total, 7923204k used, 137696k free, 4528380k buffers
Swap: 0k total, 0k used, 0k free, 393444k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8555 root 20 0 950m 143m 1728 S 154.7 1.8 875:01.07 glusterfs
8479 root 20 0 1284m 139m 1892 S 69.8 1.8 443:25.88 glusterfsd
8497 root 20 0 2628m 1.8g 1892 S 68.2 23.0 485:31.42 glusterfsd
874 root 20 0 0 0 0 S 2.3 0.0 65:34.68 jbd2/vdb1-8
58 root 20 0 0 0 0 S 0.7 0.0 44:44.37 kblockd/0
99 root 20 0 0 0 0 S 0.7 0.0 39:17.63 kswapd0
39 root 20 0 0 0 0 S 0.3 0.0 0:16.90 events/4
=====================================================
As you can see, the PID 8497 takes 1.8g mem now.
I have taken some state dumps. Later dumps are much bigger than the earlier.
================ ls -lh /var/run/gluster/*dump* ================
-rw------- 1 root root 4.1M Dec 17 17:52 mnt-b1-brick.8497.dump.1450345948
-rw------- 1 root root 292M Dec 18 09:08 mnt-b1-brick.8497.dump.1450400909
-rw------- 1 root root 297M Dec 18 09:15 mnt-b1-brick.8497.dump.1450401273
-rw------- 1 root root 292M Dec 18 09:08 mnt-b1-brick.8497.dump.1450400909
-rw------- 1 root root 297M Dec 18 09:15 mnt-b1-brick.8497.dump.1450401273
=====================================================
You can download these state dumps (gziped) from this url:
http://pan.baidu.com/s/1jHuZCMU
PuYun
From: Susant PalaiDate: 2015-12-17 20:23To: PuYunCC: gluster-usersSubject: Re: How to diagnose volume rebalance failure?Ok from your reply rebalance seems to be fine.So what you can do is check whether the mem-usage of brick process keeps increasing constantly. If that is the case take multiple state-dumps intermittently.Regards,Susant----- Original Message -----From: "PuYun" <cloudor@xxxxxxx>To: "gluster-users" <gluster-users@xxxxxxxxxxx>Cc: "gluster-users" <gluster-users@xxxxxxxxxxx>Sent: Thursday, 17 December, 2015 3:57:12 PMSubject: Re: How to diagnose volume rebalance failure?Hi Susant,Thank you for your instructions. I'll do that.My volume contains more than 2 million end sub directories. Most of the end sub directories contains 10~30 small files. Current total size is about 900G. Two bricks, each one is 1T. Current ram size is 8G.Previously I saw 3 processes, one is glusterfs for rebalance and 2 glusterfsd for bricks. Only 1 glusterfsd occupied very large mem and it is related to the newly added brick. The other 2 processes seems normal. If that happens again, I will send you the state dump.Thank you.PuYun
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users