Strange Problems in a Load Balanced Environment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey all,

 

A week ago a very strange problem surfaced on our network almost out of the blue.

 

We have three web servers which are load balanced using Zeus ZXTM.  Each web server runs Redhat Enterprise Linux 5.0 x86-64 with Apache 2.2.8 (worker mpm) and PHP 5.2.5.  All web data (document roots, etc) are pulled off of our NFS file server.  Each server has the exact same configuration as the other web servers.  This setup worked for six months.

 

Last week Friday we had an odd issue where all of the sites appeared to no longer be making persistent links to MySQL, so we forced the use of mysql_pconnect() (by removing the if-then-else and flag to set pconnect or not) and the problem was solved.  However, at the time we did this web3 stopped being able to process requests.  All of the threads in Apache would stay in the “WORKING” state and eventually we’d reach the MaxClients.  Web1 and Web2 did not and have not experienced this problem.

 

That night, we decided to bring two more web servers into the mix and we set them up the same way as the two working servers.  But they too are experiencing this problem.   So at this time, we have web1 and web2 taking 90% of all traffic and web3 taking 10% (anymore and the connections pile up, etc).  All three servers also have identical hardware specs (dual xeon 5345, 12gb ram).

 

When this issue occurs, the following happens.  On all of our MySQL servers, www1 and www2 maintain an average of 50 connected threads, but www3 creates 500+.  Also, bandwidth is really messed up too.  This past week, www1 and www2 (taking 90% load) had an average in/out bandwidth of 35Mbps on our private network.  www3, however, while only taking 10% of the requests was pushing 75 Mbps (mostly between it and our NFS server).

 

The requests coming to all three servers are identical to the requests hitting the other servers.  The configurations are exactly the same.  The code is identical. 

 

We’re at a loss.  We’ve been over every aspect of our network and setup.  We simply cannot figure this one out.

 

Anyone care to lend some advice?  We’re open to anything at this point.

 

----

Graham Frank – Neoservers, LLC

Founder and Owner

Accredited Member of the Better Business Bureau

Ph: (608) 359-1593

 


[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux