Re: Help troubleshooting performance issue, after "1000 total children" Apache no longer responds to HTTP requests. Not MaxClients issue?

Tom Evans <tevans.uk@xxxxxxxxxxxxxx> · Thu, 3 May 2012 17:33:30 +0100

On Mon, Apr 30, 2012 at 4:20 PM, P J <pauljflists@xxxxxxxxx> wrote:
> Greetings all,
>
> Hoping someone can point me in the right direction as I've spent the last
> week trying to figure out where the "issue" is but haven't been able to.
>
> Running Apache 2.2.3 on CentOS 5.8.
>
> At a few points during the day when traffic is heavy we are having an issue
> where Apache no longer responds to any HTTP requests.
>
> It sounded like a standard MaxClients being reached issue, but
> that doesn't seem to be the case.
>
> Also,, logging into the machine during this time the load average is under
> 1, and there is still plenty of RAM available.
>
> Reviewing /var/log/httpd/error_log I've noticed the following pattern:
>
> --snip--
> [Mon Apr 30 07:00:34 2012] [info] server seems busy, (you may need to
> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
> are 0 idle, and 905 total children
> [Mon Apr 30 07:00:35 2012] [info] server seems busy, (you may need to
> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
> are 0 idle, and 937 total children
> [Mon Apr 30 07:00:36 2012] [info] server seems busy, (you may need to
> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
> are 0 idle, and 969 total children
> [Mon Apr 30 07:00:37 2012] [info] server seems busy, (you may need to
> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
> are 35 idle, and 1001 total children
> [Mon Apr 30 07:00:42 2012] [debug] mpm_common.c(663): (70007)The timeout
> specified has expired: connect to listener on [::]:80
> [Mon Apr 30 07:00:49 2012] [debug] mpm_common.c(663): (70007)The timeout
> specified has expired: connect to listener on [::]:80
> [Mon Apr 30 07:00:56 2012] [debug] mpm_common.c(663): (70007)The timeout
> specified has expired: connect to listener on [::]:80
> [Mon Apr 30 07:01:03 2012] [debug] mpm_common.c(663): (70007)The timeout
> specified has expired: connect to listener on [::]:80
>
> A few times a day, right after "1000 total children" Apache stops responding
> and has to be restarted in order to work again.
>
> I've reviewed the error_log from a few weeks back and it's the same pattern,
> the server hits 1000 total children and then immediately spits out the
> "[debug] mpm_common.c(663): (70007)The timeout specified has expired:
> connect to listener on [::]:80" error message and stops responding.
>
> Yet the load on the server is quite low...
>
> Here is the relevant section from the config:
>
> Timeout 45
> KeepAlive On
> MaxKeepAliveRequests 10000
> KeepAliveTimeout 3
>
> <IfModule prefork.c>
> StartServers      80
> MinSpareServers   50
> MaxSpareServers  120
> ServerLimit     3500
> MaxClients      3500
> MaxRequestsPerChild  2000
> </IfModule>
>
>
> Anyone know why the magic number of children Apache is able to reach is 1000
> before it stops processing more requests?
> Or how to make sense of the "(70007)The timeout specified has expired:
> connect to listener on [::]:80" message?
>
> What "timeout specified" is it referring to? The 45 seconds?
>
> Any help would be greatly appreciated.
>
> Thanks in advance.
> PJ
>

Hi PJ

When the server reaches this state, what is the length of the listen
queue on the server socket? In FreeBSD speak, this is ``netstat
-Lan``, I'm afraid I don't speak Linux.

1000+ active child processes is a lot, but I guess your box can cope
quite well. This makes me think that these 1000+ processes aren't
doing much - are they mainly in keepalive state?

On our frontend proxy servers - which handle all our HTTP traffic
before being farmed out to backend servers - we got huge benefit from
switching to the event MPM, which handles keepalive connections in a
much saner manner. For instance, we can handle thousands of
simultaneous connections on event MPM, but the server would creak with
768 connections/processes on prefork MPM.

Event MPM is very good with large numbers of connections, it might be
worth giving it a try. I don't think it plays too well with mod_php
though, a better bet may be php-fcgi with the event MPM.

Another thing that is well worth doing is graphing your active
children/connections. You should be starting enough processes to be
able to at least handle your peak number of connections, not relying
on apache spawning up as many as it needs. If you regularly hit 1000
connections, I would want to start at least enough processes to handle
1400 connections.

It may also be worth stracing the parent apache process when you get
stuck in this state to see precisely what it is doing. It may be worth
stracing a child as well.

Finally, 2.2.3 - even a OS supported, vulnerability patched 2.2.3 -
won't have many fixes that exist in 2.2.22. If you can, try your
current configuration with an updated Apache. The fact that your issue
seems to happen when you go above 1024 clients - a magic number -
suggests that it may be something programatically wrong.

Hope that helps!

Cheers

Tom

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx