Re: Stability problems with Apache on OS X 10.6 Server
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Title: Re: Stability problems with Apache on OS X 10.6 Server
Hi Michael,
Thank you for the pointers. I have used you suggestions to try a out (quite) a few things. Not that I have solved my problems.
I’ll try to summarize my findings.
When I start apache, it launched a few worker threads. As I start using the server, it launched new threads. The first time a thread gets a request, it connects to the LDAP server. Once connected the thread keeps the connection to the LDAP server open / alive until the thread gets shutdown. This I can see while watching the what port connections are on the machine between the LDAP Server and the Apache server (Thanks for the command tip).
This runs fine for while. Then apache server starts to fail. Does not fail completly, it is more random thing. Some requests are servered correct, while other get a “Internal Server Error”. If I do a reload on the failed request, it may process correctly. So some of the apache threads are working, while others failing.
Here is a snapshot of the ports and runnning processes at a time when the server is partially failing. There are no warnings in any of the system logfiles about file handles or ports limits being reached.
losf:
httpd 2947 ctp 12u IPv6 0xffffff802288f2f0 0t0 TCP [::1]:55452->[::1]:389 (ESTABLISHED)
httpd 2947 ctp 13u IPv6 0xffffff800f60ae80 0t0 TCP [::1]:56411->[::1]:389 (ESTABLISHED)
httpd 2949 ctp 12u IPv6 0xffffff80228914a0 0t0 TCP [::1]:55451->[::1]:389 (ESTABLISHED)
httpd 2949 ctp 13u IPv6 0xffffff800f608fe0 0t0 TCP [::1]:55776->[::1]:389 (ESTABLISHED)
slapd 2971 root 6u IPv6 0xffffff800f609f30 0t0 TCP *:389 (LISTEN)
slapd 2971 root 7u IPv4 0xffffff8022936d68 0t0 TCP *:389 (LISTEN)
slapd 2971 root 12u IPv6 0xffffff802288efe0 0t0 TCP [::1]:389->[::1]:55451 (ESTABLISHED)
slapd 2971 root 15u IPv6 0xffffff802288fc20 0t0 TCP [::1]:389->[::1]:55452 (ESTABLISHED)
slapd 2971 root 16u IPv6 0xffffff8022890b70 0t0 TCP [::1]:389->[::1]:56411 (ESTABLISHED)
slapd 2971 root 17u IPv6 0xffffff800f609910 0t0 TCP [::1]:389->[::1]:55465 (ESTABLISHED)
slapd 2971 root 18u IPv6 0xffffff802288ff30 0t0 TCP [::1]:389->[::1]:55466 (ESTABLISHED)
slapd 2971 root 19u IPv6 0xffffff802288e9c0 0t0 TCP [::1]:389->[::1]:57959 (ESTABLISHED)
slapd 2971 root 20u IPv6 0xffffff800f60a860 0t0 TCP [::1]:389->[::1]:55621 (ESTABLISHED)
slapd 2971 root 22u IPv6 0xffffff800f60b7b0 0t0 TCP [::1]:389->[::1]:56296 (ESTABLISHED)
slapd 2971 root 23u IPv6 0xffffff800f607d80 0t0 TCP [::1]:389->[::1]:55666 (ESTABLISHED)
slapd 2971 root 24u IPv6 0xffffff802288e3a0 0t0 TCP [::1]:389->[::1]:55679 (ESTABLISHED)
slapd 2971 root 25u IPv6 0xffffff802288f910 0t0 TCP [::1]:389->[::1]:55691 (ESTABLISHED)
slapd 2971 root 26u IPv6 0xffffff802288ecd0 0t0 TCP [::1]:389->[::1]:55776 (ESTABLISHED)
httpd 2976 ctp 12u IPv6 0xffffff802288e6b0 0t0 TCP [::1]:55465->[::1]:389 (ESTABLISHED)
httpd 2977 ctp 12u IPv6 0xffffff800f6083a0 0t0 TCP [::1]:55466->[::1]:389 (ESTABLISHED)
httpd 3055 ctp 12u IPv6 0xffffff800f609c20 0t0 TCP [::1]:55621->[::1]:389 (ESTABLISHED)
httpd 3185 ctp 12u IPv6 0xffffff800f609600 0t0 TCP [::1]:55666->[::1]:389 (ESTABLISHED)
httpd 3198 ctp 12u IPv6 0xffffff8022890550 0t0 TCP [::1]:55691->[::1]:389 (ESTABLISHED)
httpd 3199 ctp 12u IPv6 0xffffff8022890e80 0t0 TCP [::1]:55679->[::1]:389 (ESTABLISHED)
httpd 3199 ctp 13u IPv6 0xffffff80228917b0 0t0 TCP [::1]:57959->[::1]:389 (ESTABLISHED)
httpd 28410 ctp 12u IPv6 0xffffff800f607a70 0t0 TCP [::1]:56296->[::1]:389 (ESTABLISHED)
ps ax | grep httpd
2946 ?? Ss 0:00.97 /usr/local/apache2/bin/httpd -k start
2947 ?? S 0:00.32 /usr/local/apache2/bin/httpd -k start
2949 ?? S 0:00.35 /usr/local/apache2/bin/httpd -k start
2976 ?? S 0:00.31 /usr/local/apache2/bin/httpd -k start
2977 ?? S 0:00.30 /usr/local/apache2/bin/httpd -k start
3055 ?? S 0:00.25 /usr/local/apache2/bin/httpd -k start
3185 ?? S 0:00.24 /usr/local/apache2/bin/httpd -k start
3198 ?? S 0:00.36 /usr/local/apache2/bin/httpd -k start
3199 ?? S 0:00.24 /usr/local/apache2/bin/httpd -k start
28410 ?? S 0:00.08 /usr/local/apache2/bin/httpd -k start
28752 ?? S 0:00.02 /usr/local/apache2/bin/httpd -k start
Combining the two you can see, that some of the threads have a connection while others do not. And in the the apache error log I see:
could not lookup DNS configuration info service: (ipc/send) inval[Thu Jan 28 10:04:02 2010] [warn] [client 10.2.3.164] [28752] auth_ldap authenticate: user thsc authentication failed; URI xxx [LDAP: ldap_simple_bind_s() failed][Can't contact LDAP server], referer: https://xxx/xxxx
So the thread not serving request is the one which has no connection to the LDAP server and can’t get it either. Other programs, proftd, ldapsearch, accessing the LDAP server are not experiencing any problems while this is happing.
Now if I shutdown the LDAP server and start it again, the apache server comes back into the same “state”, ie randomly failing. If I that a new snapshot of the ports open. all threads are able to reconnect to LDAP server, except the thread 28752 , which puts out the same log message again.
To see if it might have something to do with a specific version of the OpenLDAP server, I have also done this with different versions of the OpenLDAP server., ie. Shutting down one version and starting up an older version of the LDAP server. The result is the same.
Last time I wrote that is seams to depend on the number of requests to the server. After my testing I more inclined to think it is a time issue. It seems to be of less importance how much I stress the server. If I stress it a lot, it can run 1/2 to 1 hour before starting to fail. If I only make occational request I get from 1 hour up to 3 hours. If I start the server, make a few requests and leave it for the night, it will be in the partially failed state in the morning.
I have the (nearly) same setup running on an OS X 10.4 server without any problems. I write nearly, because on the 10.4 machine it is an apache 2.0.49 server, while I have an 2.2.14 apache server on the OS X 10.6 server. The ProFTPD server I use on the two setups is of the same version and it runs into the exact same problem on OX 10.6 as the Apache server does, with the same error message in it’s log files.
Does this make any sense ? After this I’m still inclined to suspect OS X 10.6 ...
Regards,
Thomas Schneider
On 21/12/09 17:31, "Michael A. Pasek" <mike001@xxxxxxxxxxxxxxxxx> wrote:
In users Digest Issue 3741 (21 Dec 2009 15:54:32 -0000), Thomas Scheider wrote:
> [problems with LDAP authentication on 10.6.2]
> After 5000+ SSL requests, users begin to be denied log-in to the web
> site. In the Apache error_log the following is written:
>
> [Mon Dec 21 09:14:23 2009] [info] Initial (No.1) HTTPS request received for child 6 (server 172.25.2.99:443)
> could not lookup DNS configuration info service: (ipc/send) invalid destination port
> [...]
> On the same server I have a ProFTPD running which also does log-in
> verification against the LDAP server and retrieves varios information
> about the user. The program runs into the same problems, ie. the "could
> not lookup ..." begins to appear in the itøs log file, and users are
> refused access to the FTP server.
> [...]
This would imply that the problem lies not with Apache, but with either
the LDAP server or OS 10.6.2 (which is, I assume, the OS on which the
Apache and ProFTPd applications are running).
> The messages "could not lookup .." does not begin to appear at the same
> time in the logfiles. The apache may be running fine, while the FTP server
> is rejecting users, and wise versa.
I'd suspect that the connections to the LDAP server are not being "cleaned
up", and once the application reaches it's per-process file descriptor limit
it is being denied its request to open another network connection (i.e.,
allocate another file descriptor). Check the output of:
lsof -nPi | grep ":389"
(NOTE: You must execute this as "root" in order to see _all_ the connections)
This should show you all the current connections to the LDAP server, which
application/process is "controlling" that connection, and the current
connection state.
You might also check the system.log; there may be entries in there if the
LDAP connection requests ARE being rejected due to the filedescriptor limit.
What to do next depends on:
a) Whether my theory is correct; and,
b) What state the connections are "hung" in.
Regards,
Michael A. Pasek
[Index of Archives]
[Open SSH Users]
[Linux ACPI]
[Linux Kernel]
[Linux Laptop]
[Kernel Newbies]
[Security]
[Netfilter]
[Bugtraq]
[Squid]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Samba]
[Video 4 Linux]
[Device Mapper]