httpd child processes die, never respawn

"Ryan Marrs" <ryan.marrs@xxxxxxxxxxxxxx> · Mon, 8 Oct 2007 13:56:30 -0400

Hey guys,

I've got an issue.  I'm running apache on a CentOS 5 box.  It's a dual
processor box that's not being taxed at all (highest load I've seen is about
10%), but for some reason, when I start httpd, it starts the number of
servers specified in the httpd.conf, as follows:

StartServers       12
MinSpareServers    8
MaxSpareServers    20
MaxClients         32
MaxRequestsPerChild 4000

It spawns the 12 servers no problem, handles requests beautifully, but then
about a minute later, processes start dropping off.  It respawns a few, but
it seems as if they're dying faster than they can respawn until I'm left
with just the root process running, and apache doesn't respond to any
requests.  I've hacked around this with a cronjob to do a graceful reset
every minute, but I need a solution to this problem.

I've provided a list of rpm -qa results at:
http://www.detroitk12.org/rpm-qa.txt, so if you see any issues with
conflicting or buggy versions of anything, you can let me know and we can
work on that.   This box runs our primary intranet server, and it's running
the open source Metadot CMS software.  Metadot has been working with us on
this issue, but they can't seem to figure it out either.  If anyone is
interested in what's used by the Metadot CMS, it's available at the
following: 
http://www.metadot.com/metadot/index.pl?iid=2558

I personally hate this CMS,  but we inherited it, and the support that we
pay far too much for a year is decent, and they've put in as many hours as I
have on this.

To provide a little background on this, this started out as a Centos 4.3
install on a VMWare instance.  After attempting many upgrades and
troubleshooting, we finally got around to this running on its own box,
running Centos 5, no VMWare.  I've done an strace on the single root apache
process while this is running, and it looks like this:

read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
<--  CROPPED -->
close(3)                                = 0
waitpid(11743, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 11743
munmap(0xb7f74000, 4096)                = 0
rt_sigaction(SIGTSTP, {SIG_IGN}, {0x72bd790, [], SA_RESTART}, 8) = 0
write(1, "\33[1;119H2\33[5;14H3\33[5;30H40   972"..., 49) = 49
rt_sigaction(SIGTSTP, {0x72bd790, [], SA_RESTART}, NULL, 8) = 0
nanosleep({1, 0},

The first 5 lines of which I've cropped.  It was the same for 64 lines.  And
it repeated itself over and over again.  I see a few of these in log file: 

[Mon Oct 08 13:29:06 2007] [info] [client 10.113.40.66] (104)Connection
reset by peer: core_output_filter: writing data to the network

I tried to hunt down this error, and a lot of people think it's related to
running NFS... which we're not...  which is partially the reason I fought to
get VMWare out of the equation.

I've been pulling my hair out on this.  We've started with a fresh install
with none of these signs, and by the time we get everything up and running,
we're back to where we started.  I've tweaked the MPM settings about 300
times, and I'm confident they're no longer the issue.  

At this point, I think there must be a bug in the version of mod_perl,
python, or some other module that we're running.  

Here's the httpd.conf: http://www.detroitk12.org/httpd.conf.txt

Anything you'd like to know, I'll be ready to provide.  I've literally been
trying to fix this issue for 3 weeks, and at this point, I'll take anything
you got.

Thanks,

Ryan

Disclaimer:
   This e-mail is intended solely for the above-mentioned recipient(s) and it may contain confidential or privileged information. If you have received it in error, please notify the sender immediately and delete the e-mail. You must not copy, distribute, disclose, store or take any action in reliance on it.  Neither this information block, the typed name of the sender, nor anything else in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx