Apache Processes Hung "Sending Reply"

Tom Ritter <tom@xxxxxxxxx> · Mon, 1 Feb 2010 17:34:16 -0300

I have 40 or so apache processes suspended in "Sending Reply".  My hypothesis
is that MySQL had a problem, and either apache or php somehow got gummed up
and isn't cleaning up for some reason.  I'm hoping the list can give me more
ideas for debugging or point me in the right direction.

Here is the output of http://localhost/server-status:

	Server uptime: 1 day 6 hours 57 minutes 9 seconds
	Total accesses: 47613 - Total Traffic: 498.2 MB
	CPU Usage: u1446.77 s548.53 cu6.26 cs0 - 1.8% CPU load
	.427 requests/sec - 4688 B/second - 10.7 kB/request
	41 requests currently being processed, 8 idle workers
	WW_WWW_WWWWW_WWWWWWWWWW_W_WWWW__WWW.WWWWWWWW_WWWWW

Examining the logs confirms that the last request on each pid was quite a while
ago, and they are just hanging out doing nothing.

The server:
 - RHEL
		$uname -a
		Linux xxx 2.6.18-164.6.1.el5 #1 SMP Tue Oct 27 11:30:06 EDT 2009
i686 i686 i386 GNU/Linux
 - Apache:
		Server version: Apache/2.2.3
		Server built:   Nov 10 2009 09:06:57
 - PHP:
		$php -v
		PHP 5.1.6 (cli) (built: Feb 26 2009 07:01:10)
		Zend Engine v2.1.0
 - Runs Wordpress (not my choice)
 - Receives mostly search crawler traffic at a steady rate
 - has a lot of "(32)Broken pipe: core_output_filter: writing data to the
     network" and "(104)Connection reset by peer: core_output_filter: writing
     data to the network" messages
 - stopping reporting to rrdtool/cacti between 18:50 and 21:30 last night
 - Had a child process die with the error /usr/sbin/httpd: free():
invalid pointer: 0x0a2044a4
     however this was about 20 minutes *after* the problem began
 - had some "database error MySQL server has gone away for query" errors around
     18:50 last night
 - is behind an F5 device that proxies all connections - so every connection to
     the server comes from the same IP address

Relevant config:

	Timeout 40
	KeepAlive On
	MaxKeepAliveRequests 200
	KeepAliveTimeout 5
	StartServers       3
	MinSpareServers    2
	MaxSpareServers   10
	ServerLimit       50
	MaxClients        50
	MaxRequestsPerChild  1000

I've only been able to find one person who had a similar problem, and his was
caused by "dodgy sql": http://marc.info/?l=tomcat-user&m=106319217331935&w=2
(His was also involving tomcat which I do not have.)

The biggest issue is that the processes should time out and clean up after
themselves, right?  But they're not - instead they're just sitting consuming
RAM.  (Not entirely sure about that - in some stacktraces I see
<signal handler called> followed by "zend_timeout ()".)

My hypothesis is that MySQL had a problem, and either apache or php somehow
got gummed up and isn't cleaning up for some reason.

I'm sure a httpd restart will clean everything up, but I wanted to debug this
as best I could.  I gdb-ed a stacktrace for 8 of the hung threads, but it's
not compiled in debug mode.  The stacktraces, and other relevant data, is here:
http://ritter.vg/misc/apache-debug/

If anyone can suggest further things to try to debug this, or any additional
info, I'd appreciate it.

-tom

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx