ok, let's see if we can gather more info.
I am not a specialist, but you know... another pair of eyes.
My system has a single glusterd process and it has a pretty low PID, meaning it has not crashed.
What is your PID for your glusterd ? how many zombie processes are there reported by top ?
I've been running my preliminary tests with gluster for a little over a month now and have never seen this. My platform is CentOS 6.5, so, I'd say it is pretty similar.
From my perspective, even making gluster sweat, running some intense rsync jobs in parallel, and seeing glusterd AND glusterfs take 120% of processing time on top (each on one core), they never crashed.
My zombie count, from top, is zero.
On the other hand, I had one of my nodes, the other day, crashing a process every time I started a high demanding task. Ends up I had (and still have) a hardware problem on one of the processor (or the main board; still undiagnosed).
Do you have this problem on one node only ?
Any chance you have something special compiled on your kernel ?
Any particularly memory-hungry tweak on your sysctl ?
Sounds like the system, not gluster.
KR,
Carlos
On Fri, Mar 21, 2014 at 10:29 PM, Steve Thomas <sthomas@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Hi all…Further investigation shows in excess of 500 glusterd zombie processes and continuing to climb on the box …Any suggestions? Am happy to provide logs etc to get to the bottom of this…._____________________________________________
From: Steve Thomas
Sent: 21 March 2014 13:21
To: 'gluster-users@xxxxxxxxxxx'
Subject: Gluster 3.4.2 on Redhat 6.5Hi,I’m running Gluster 3.4.2 on Redhat 6.5 with 4 servers with a brick on each. This brick is mounted locally and used by apache to server audio files for an IVR system. Each of these audio files are typically around 80-100Kb.System appears to be working ok in terms of health and status via gluster CLI.The system is monitored by nagios and there’s a check for zombie processes and the gluster status. It appears that over a 24 hour period the number of Zombie processes on the box has increased and is continually increasing. Investigating these are “glusterd” processes.I’m making an assumption but I’d suspect that the regular nagios checks are resulting in the increase in zombie processes as they are querying the glusterd process. The command that the nagios plugin is running is:#Check heal statusgluster volume heal audio info#Check volume statusgluster volume status audio detailDoes anyone have any suggestions as to why glusterd is resulting in these zombie processes?Thanks for help in advance,Steve
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users