Re: Apache Processes Hung "Sending Reply"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 1, 2010 at 3:34 PM, Tom Ritter <tom@xxxxxxxxx> wrote:
> I have 40 or so apache processes suspended in "Sending Reply".  My hypothesis
> is that MySQL had a problem, and either apache or php somehow got gummed up
> and isn't cleaning up for some reason.  I'm hoping the list can give me more
> ideas for debugging or point me in the right direction.
>
>
>
> Here is the output of http://localhost/server-status:
>
>        Server uptime: 1 day 6 hours 57 minutes 9 seconds
>        Total accesses: 47613 - Total Traffic: 498.2 MB
>        CPU Usage: u1446.77 s548.53 cu6.26 cs0 - 1.8% CPU load
>        .427 requests/sec - 4688 B/second - 10.7 kB/request
>        41 requests currently being processed, 8 idle workers
>        WW_WWW_WWWWW_WWWWWWWWWW_W_WWWW__WWW.WWWWWWWW_WWWWW
>
> Examining the logs confirms that the last request on each pid was quite a while
> ago, and they are just hanging out doing nothing.
>
> The server:
>  - RHEL
>                $uname -a
>                Linux xxx 2.6.18-164.6.1.el5 #1 SMP Tue Oct 27 11:30:06 EDT 2009
> i686 i686 i386 GNU/Linux
>  - Apache:
>                Server version: Apache/2.2.3
>                Server built:   Nov 10 2009 09:06:57
>  - PHP:
>                $php -v
>                PHP 5.1.6 (cli) (built: Feb 26 2009 07:01:10)
>                Zend Engine v2.1.0
>  - Runs Wordpress (not my choice)
>  - Receives mostly search crawler traffic at a steady rate
>  - has a lot of "(32)Broken pipe: core_output_filter: writing data to the
>     network" and "(104)Connection reset by peer: core_output_filter: writing
>     data to the network" messages
>  - stopping reporting to rrdtool/cacti between 18:50 and 21:30 last night
>  - Had a child process die with the error /usr/sbin/httpd: free():
> invalid pointer: 0x0a2044a4
>     however this was about 20 minutes *after* the problem began
>  - had some "database error MySQL server has gone away for query" errors around
>     18:50 last night
>  - is behind an F5 device that proxies all connections - so every connection to
>     the server comes from the same IP address
>
> Relevant config:
>
>        Timeout 40
>        KeepAlive On
>        MaxKeepAliveRequests 200
>        KeepAliveTimeout 5
>        StartServers       3
>        MinSpareServers    2
>        MaxSpareServers   10
>        ServerLimit       50
>        MaxClients        50
>        MaxRequestsPerChild  1000
>
>
> I've only been able to find one person who had a similar problem, and his was
> caused by "dodgy sql": http://marc.info/?l=tomcat-user&m=106319217331935&w=2
> (His was also involving tomcat which I do not have.)

Apache processes hung in W/"Sending Reply" is a huge class of problems
with endless root causes.  The aspect common to most of these is that
application code running inside Apache (e.g., mod_php) or outside
Apache (e.g., Tomcat or anything Apache proxies too) has hung.

> The biggest issue is that the processes should time out and clean up after
> themselves, right?  But they're not - instead they're just sitting consuming
> RAM.  (Not entirely sure about that - in some stacktraces I see
> <signal handler called> followed by "zend_timeout ()".)

Apache hands the request over to mod_php to be processed synchronously
on the calling thread.  It is up to mod_php to decide what to do,
whether to timeout any anticipated conditions, etc.  Apache isn't
monitoring it.

> My hypothesis is that MySQL had a problem, and either apache or php somehow
> got gummed up and isn't cleaning up for some reason.

mod_php never returned to Apache; no thoughts here on what event
triggered whatever bug you encountered.

>
> I'm sure a httpd restart will clean everything up, but I wanted to debug this
> as best I could.  I gdb-ed a stacktrace for 8 of the hung threads, but it's
> not compiled in debug mode.  The stacktraces, and other relevant data, is here:
> http://ritter.vg/misc/apache-debug/
>
> If anyone can suggest further things to try to debug this, or any additional
> info, I'd appreciate it.

You got the right sort of information to start with.  Theoretically
some glibc heap experts could tell you what it means to block in that
spot, but I anticipate that the answer would be the rather vague
"memory overlays or other invalid use of the heap by the application."
 As for which component did it, I'd wager that it isn't Apache.

I wonder if your PHP/extensions/related libraries have all appropriate
fixes for memory corruption or heap library misuse.  (I guess these
are all RedHat-patched binaries.)

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx



[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux