Re: Stability problems with Apache on OS X 10.6 Server

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Title: Re: Stability problems with Apache on OS X 10.6 Server
Hi Michael,

Thank you for the pointers. I have used you suggestions to try a out (quite) a few things. Not that I have solved my problems.
I’ll try to summarize my findings.

When I start apache, it launched a few worker threads. As I start using the server, it launched new threads. The first time a thread gets a request, it connects to the LDAP server. Once connected the thread keeps the connection to the LDAP server open / alive until the thread gets shutdown. This I can see while watching the what port connections are on the machine between the LDAP Server and the Apache server (Thanks for the command tip).

This runs fine for while. Then apache server starts to fail. Does not fail completly, it is more random thing. Some requests are servered correct, while other get a  “Internal Server Error”. If I do a reload on the failed request, it may process correctly. So some of the apache threads are working, while others failing.

Here is a snapshot of the ports and runnning processes at a time when the server is partially failing. There are no warnings in any of the system logfiles about file handles or ports limits being reached.

losf:

httpd      2947            ctp   12u  IPv6 0xffffff802288f2f0      0t0    TCP [::1]:55452->[::1]:389 (ESTABLISHED)
httpd      2947            ctp   13u  IPv6 0xffffff800f60ae80      0t0    TCP [::1]:56411->[::1]:389 (ESTABLISHED)
httpd      2949            ctp   12u  IPv6 0xffffff80228914a0      0t0    TCP [::1]:55451->[::1]:389 (ESTABLISHED)
httpd      2949            ctp   13u  IPv6 0xffffff800f608fe0      0t0    TCP [::1]:55776->[::1]:389 (ESTABLISHED)
slapd      2971           root    6u  IPv6 0xffffff800f609f30      0t0    TCP *:389 (LISTEN)
slapd      2971           root    7u  IPv4 0xffffff8022936d68      0t0    TCP *:389 (LISTEN)
slapd      2971           root   12u  IPv6 0xffffff802288efe0      0t0    TCP [::1]:389->[::1]:55451 (ESTABLISHED)
slapd      2971           root   15u  IPv6 0xffffff802288fc20      0t0    TCP [::1]:389->[::1]:55452 (ESTABLISHED)
slapd      2971           root   16u  IPv6 0xffffff8022890b70      0t0    TCP [::1]:389->[::1]:56411 (ESTABLISHED)
slapd      2971           root   17u  IPv6 0xffffff800f609910      0t0    TCP [::1]:389->[::1]:55465 (ESTABLISHED)
slapd      2971           root   18u  IPv6 0xffffff802288ff30      0t0    TCP [::1]:389->[::1]:55466 (ESTABLISHED)
slapd      2971           root   19u  IPv6 0xffffff802288e9c0      0t0    TCP [::1]:389->[::1]:57959 (ESTABLISHED)
slapd      2971           root   20u  IPv6 0xffffff800f60a860      0t0    TCP [::1]:389->[::1]:55621 (ESTABLISHED)
slapd      2971           root   22u  IPv6 0xffffff800f60b7b0      0t0    TCP [::1]:389->[::1]:56296 (ESTABLISHED)
slapd      2971           root   23u  IPv6 0xffffff800f607d80      0t0    TCP [::1]:389->[::1]:55666 (ESTABLISHED)
slapd      2971           root   24u  IPv6 0xffffff802288e3a0      0t0    TCP [::1]:389->[::1]:55679 (ESTABLISHED)
slapd      2971           root   25u  IPv6 0xffffff802288f910      0t0    TCP [::1]:389->[::1]:55691 (ESTABLISHED)
slapd      2971           root   26u  IPv6 0xffffff802288ecd0      0t0    TCP [::1]:389->[::1]:55776 (ESTABLISHED)
httpd      2976            ctp   12u  IPv6 0xffffff802288e6b0      0t0    TCP [::1]:55465->[::1]:389 (ESTABLISHED)
httpd      2977            ctp   12u  IPv6 0xffffff800f6083a0      0t0    TCP [::1]:55466->[::1]:389 (ESTABLISHED)
httpd      3055            ctp   12u  IPv6 0xffffff800f609c20      0t0    TCP [::1]:55621->[::1]:389 (ESTABLISHED)
httpd      3185            ctp   12u  IPv6 0xffffff800f609600      0t0    TCP [::1]:55666->[::1]:389 (ESTABLISHED)
httpd      3198            ctp   12u  IPv6 0xffffff8022890550      0t0    TCP [::1]:55691->[::1]:389 (ESTABLISHED)
httpd      3199            ctp   12u  IPv6 0xffffff8022890e80      0t0    TCP [::1]:55679->[::1]:389 (ESTABLISHED)
httpd      3199            ctp   13u  IPv6 0xffffff80228917b0      0t0    TCP [::1]:57959->[::1]:389 (ESTABLISHED)
httpd     28410            ctp   12u  IPv6 0xffffff800f607a70      0t0    TCP [::1]:56296->[::1]:389 (ESTABLISHED)

ps ax | grep httpd
2946   ??  Ss     0:00.97 /usr/local/apache2/bin/httpd -k start
 2947   ??  S      0:00.32 /usr/local/apache2/bin/httpd -k start
 2949   ??  S      0:00.35 /usr/local/apache2/bin/httpd -k start
 2976   ??  S      0:00.31 /usr/local/apache2/bin/httpd -k start
 2977   ??  S      0:00.30 /usr/local/apache2/bin/httpd -k start
 3055   ??  S      0:00.25 /usr/local/apache2/bin/httpd -k start
 3185   ??  S      0:00.24 /usr/local/apache2/bin/httpd -k start
 3198   ??  S      0:00.36 /usr/local/apache2/bin/httpd -k start
 3199   ??  S      0:00.24 /usr/local/apache2/bin/httpd -k start
28410   ??  S      0:00.08 /usr/local/apache2/bin/httpd -k start
28752   ??  S      0:00.02 /usr/local/apache2/bin/httpd -k start

Combining the two you can see, that some of the threads have a connection while others do not. And in the the apache error log I see:

could not lookup DNS configuration info service: (ipc/send) inval[Thu Jan 28 10:04:02 2010] [warn] [client 10.2.3.164] [28752] auth_ldap authenticate: user thsc authentication failed; URI xxx [LDAP: ldap_simple_bind_s() failed][Can't contact LDAP server], referer: https://xxx/xxxx

So the thread not serving request is the one which has no connection to the LDAP server and can’t get it either. Other programs, proftd, ldapsearch, accessing the LDAP server are not experiencing any problems while this is happing.

Now if I shutdown the LDAP server and start it again, the apache server comes back into the same “state”, ie randomly failing. If I that a new snapshot of the ports open. all threads are able to reconnect to LDAP server, except the thread 28752 , which puts out the same log message again.

To see if it might have something to do with a specific version of the OpenLDAP server, I have also done this with different versions of the OpenLDAP server., ie. Shutting down one version and starting up an older version of the LDAP server. The result is the same.

Last time I wrote that is seams to depend on the number of requests to the server. After my testing I more inclined to think it is a time issue. It seems to be of less importance how much I stress the server. If I stress it a lot, it can run 1/2 to 1 hour before starting to fail. If I only make occational request I get from 1 hour up to 3 hours. If I start the server, make a few requests and leave it for the night, it will be in the partially failed state in the morning.

I have the (nearly) same setup running on an OS X 10.4 server without any problems. I write nearly, because on the 10.4 machine it is an apache 2.0.49 server, while I have an 2.2.14 apache server on the OS X 10.6 server. The ProFTPD server I use on the two setups is of the same version and it runs into the exact same problem on OX 10.6 as the Apache server does, with the same error message in it’s log files.

Does this make any sense ? After this I’m still inclined to suspect OS X 10.6 ...

Regards,

 Thomas Schneider


On 21/12/09 17:31, "Michael A. Pasek" <mike001@xxxxxxxxxxxxxxxxx> wrote:

In users Digest Issue 3741 (21 Dec 2009 15:54:32 -0000), Thomas Scheider wrote:

> [problems with LDAP authentication on 10.6.2]
> After 5000+ SSL requests, users begin to be denied log-in to the web
> site. In the Apache error_log the following is written:
>
> [Mon Dec 21 09:14:23 2009] [info] Initial (No.1) HTTPS request received for child 6 (server 172.25.2.99:443)
> could not lookup DNS configuration info service: (ipc/send) invalid destination port
> [...]
> On the same server I have a ProFTPD running which also does log-in
> verification against the LDAP server and retrieves varios information
> about the user. The program runs into the same problems, ie. the "could
> not lookup  ..." begins to appear in the itøs log file, and users are
> refused access to the FTP server.
> [...]

This would imply that the problem lies not with Apache, but with either
the LDAP server or OS 10.6.2 (which is, I assume, the OS on which the
Apache and ProFTPd applications are running).

> The messages "could not lookup .." does not begin to appear at the same
> time in the logfiles. The apache may be running fine, while the FTP server
> is rejecting users, and wise versa.

I'd suspect that the connections to the LDAP server are not being "cleaned
up", and once the application reaches it's per-process file descriptor limit
it is being denied its request to open another network connection (i.e.,
allocate another file descriptor).  Check the output of:
  lsof -nPi | grep ":389"
(NOTE: You must execute this as "root" in order to see _all_ the connections)
This should show you all the current connections to the LDAP server, which
application/process is "controlling" that connection, and the current
connection state.

You might also check the system.log; there may be entries in there if the
LDAP connection requests ARE being rejected due to the filedescriptor limit.

What to do next depends on:
  a) Whether my theory is correct; and,
  b) What state the connections are "hung" in.

Regards,

Michael A. Pasek


[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux