Apache getting stuck with all workers in a BUSY_READ state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, I've been having problems with apache becoming unresponsive, and
was wondering if anyone had any suggestions on what the problem might
be. Basically, periodically, apache will get into a state where all
the workers are stuck reading:

Server Version: Apache
Server Built: Oct 21 2009 10:54:43
Current Time: Tuesday, 15-Jun-2010 07:57:30 PDT
Restart Time: Tuesday, 15-Jun-2010 06:37:33 PDT
Parent Server Generation: 0
Server uptime:  1 hour 19 minutes 57 seconds
Total accesses: 985801 - Total Traffic: 8.1 GB
CPU Usage: u644.89 s203.76 cu3994.75 cs0 - 101% CPU load
206 requests/sec - 1.7 MB/second - 8.6 kB/request
1593 requests currently being processed, 15 idle workers
RRRRRRRRRRRRRCRRRRKRRRRRRRRRRRRRRRRRRRKRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRCRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRCRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRKRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRKRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRCRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RKRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRCRRRRRRRRRRRRRRRRRRRRRRKRRRRRRRRRRRKRKRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRKRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRKRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRKRRRRRRRRRRRRRRRRRRRRRRRRRRRRRKRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRWRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRWRRRRKKCRRKRKRRRRKRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRC
RRKKRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRKRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRRCKRRRRCCRRKRRRRRRRRRRRRRRRKRRRCRRRRRRRRRRRRRCCRRRCRRCRRR
RRRRRRRKKRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRWRRRRRKRRRKRRRRRRRRRRRRRW
KKRRRRRRRKRRRRWRKRRRRRRRRRRRRRRRWRRRRRRRRRRRR___RRR__RR___R_____
WRR__RRRSS......................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................

This is prior to complete failure - sometimes whatever's blocking gets
unblocked before it hits max clients, sometimes it doesn't. I'm
running apache 2.0.59 built with openssl 0.9.8n on AIX 6.1 with
prefork, and this is virtually all SSL traffic (pretty much everything
other than the scoreboard). A restart basically "fixes" the problem,
from the perspective that all the workers get killed and after the
initial thrashing of starting up new workers.

>From my understanding of the READ state above, everything above is
stuck in one of two broad categories:

 - A client made the TCP connection to the server, and is somewhere
between the tcp handshake and the end of the HTTP Request info. This
suggests it could be a network issue (something's hanging the
connections), or an openssl issue (the TLS/SSL negotiation is
slow/hanging), or...?
 - The request has been completed, but we're proxying to somewhere
else and waiting for a response from the proxy. This potentially
applies in this case, because we do have apache setup to proxy some
URLs to another server.

There's nothing in the access or error logs jumping out to correlate
with this problem either - There are MaxClient issues once it hits
that, of course, but nothing related to the BUSY_READ state.

 When having the problem, I've correlated the scoreboard with the
ps/lsof/netstat output, and the second case seems unlikely because I'm
not seeing any open connections to the server that apache is proxy'ing
to. It feels like there's some shared resource that all the apache
workers are trying to access, but I can't figure out what it might be.
Any suggestions on a solution, or how I might get more info out of
apache as to what it's doing while everyone's in the read state? Are
there other broad categories I'm missing as to why the workers might
be in the read state? Any further info I could provide to help anyone?
My next steps are to dive into the apache source further and see what
possible resources it could be blocking on, but I'm hoping someone
smarter than me already knows. :)


-- 
Dave Fallon

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx



[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux