Re: Load balancing question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,
 
Thanks for the tips, I implemented them but unfortunately it did not help.
 
In our stress test the cluster was brought to its knees but we still cannot figure out where the bottleneck is.
Things we check using perfmon:
CPU% - average never above 70 on any of the servers
CPU Queue length - average never above 14
Pages Input / second - nominal
Avg. Disk Read / Write Queue - average never above 0.5
Network usage - never spikes above 25%
 
I beleive we may be maxing at badwidth at 10mb.  We are upping to 30mb shortly.
 
The server recovers fine afterwords now though which is good news.
 
This is one worker and the lb:
 
worker.tomcat6.port=16009
worker.tomcat6.host=192.168.150.12
worker.tomcat6.type=ajp13
worker.tomcat6.reply_timeout=30000
worker.tomcat6.lbfactor=100
 
worker.loadbalancer.type=lb
worker.loadbalancer.balanced_workers=tomcat2, tomcat1,tomcat4,tomcat5,tomcat6
 
This is the status page under a fresh restart and 4 minutes of heavy load.
 
Worker Status for loadbalancer
Type Sticky Sessions Force Sticky Sessions Retries LB Method Locking Recover Wait Time Max Reply Timeouts
lb True False 2 Request Optimistic 60 0
 
Good Degraded Bad/Stopped Busy Max Busy Next Maintenance
5 0 0 224 248 55/117
 
Balancer Members [Hide]
  Name Type Host Addr Act State D F M V Acc Err CE RE Wr Rd Busy Max Route RR Cd Rs
[E|R]  tomcat2 ajp13 localhost:12009 127.0.0.1:12009 ACT OK 0 50 2 602 3665 0 4 0 1.1M 52M 30 37 tomcat2     0/0
[E|R]  tomcat1 ajp13 localhost:11009 127.0.0.1:11009 ACT OK 0 50 2 600 3594 0 8 0 1.1M 52M 30 42 tomcat1     0/0
[E|R]  tomcat4 ajp13 192.168.150.12:14009 192.168.150.12:14009 ACT OK 0 100 1 602 7303 0 15 0 2.3M 103M 58 73 tomcat4     0/0
[E|R]  tomcat5 ajp13 192.168.150.13:15009 192.168.150.13:15009 ACT OK 0 100 1 601 7173 0 18 0 2.2M 103M 54 73 tomcat5     0/0
[E|R]  tomcat6 ajp13 192.168.150.12:16009 192.168.150.12:16009 ACT OK 0 100 1 601 7261 0 18 0 2.3M 101M 51 71 tomcat6     0/0
 

 
Any ideas?
 
Thanks in advance,
--James
----- Original Message -----
From: Bj
Sent: Monday, September 17, 2007 11:01 AM
Subject: Re: Load balancing question

I think you should use timeout !
It seems that your request take a long time to be computed by your tomcats. If you reach the max connections (http or ajp ) then you have to wait for tomcat response to free a connection slot.
What says your jk_status page ? are all your workers in error state ? how many busy connections do you have ?

You can :
  - in httpd.conf :
        +if your using keepalive, add a keepalive timeout. 5,10 or 15 s may be enough.
         + if your using mpm_winnt, increase ThreadsPerChild value to increase max available connections.

  - in workers.properties :
              + worker.yourworker.reply_timeout=30000. after 30s without response, the connection will try another worker or fail.
              + limit your connection_pool_size if your are in multi thread httpd mode. You may have on connection per thread which can overload your tomcats.

 - in your tomcat : increase your AJP connectors maxThreads. 200 by default. It's no very efficient to have too much thread but it can prevent you from refused connections.

--
Bj





http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html



On 9/17/07, James Sherwood <jsherwood@xxxxxxxxxxxxxxxx> wrote:
Hello,
 
CORRECTED(status page working now)
 
We upgraded to the latest mod_jk and this were the results:
 
1: All monitors were fine, there were no bottlenecks anywhere that we could find(cpu's,HD's and networks all seemed fine).
2: This time when we brought the servers to their knees, they recovered a short time after the test was completed.
3: We tried the socket_keepalive=true for the workers and the server did not recover after
4: the only problem we can find is after the test in the mod_jk log we have about 20-30 lines of this:
[Mon Sep 17 08:03:49.906 2007] [7948:4868] [error] jk_ajp_common.c (2097): (tomcat5) Connecting to tomcat failed. Tomcat is probably not started or is listening on the wrong port
 
The lines vary only by the (tomcat5) being any of the tomcats in the loadbalance.
 
It seems like apache/tomcat/mod_jk are reaching the max number of connections between each other or something?
 
Any help would be GREATLY appreciated,
--James
----- Original Message -----
Sent: Monday, September 17, 2007 9:12 AM
Subject: Re: Load balancing question

Hello,
 
I cannot get my mod_jk status page to work.  Maybe it is because I am on windows?
It seems:
 
worker.list=jk-manage
worker.jk-manage.type=status
worker.jk-manage.mount=/admin/status/jk
only takes a linux style path for the mount?
 
We upgraded to the latest mod_jk and this were the results:
 
1: All monitors were fine, there were no bottlenecks anywhere that we could find(cpu's,HD's and networks all seemed fine).
2: This time when we brought the servers to their knees, they recovered a short time after the test was completed.
3: We tried the socket_keepalive=true for the workers and the server did not recover after
4: the only problem we can find is after the test in the mod_jk log we have about 20-30 lines of this:
[Mon Sep 17 08:03:49.906 2007] [7948:4868] [error] jk_ajp_common.c (2097): (tomcat5) Connecting to tomcat failed. Tomcat is probably not started or is listening on the wrong port
 
The lines vary only by the (tomcat5) being any of the tomcats in the loadbalance.
 
It seems like apache/tomcat/mod_jk are reaching the max number of connections between each other or something?
 
Any help would be GREATLY appreciated,
--James
 
 
 
 
----- Original Message -----
From: Bj
Sent: Saturday, September 15, 2007 5:17 AM
Subject: Re: Load balancing question

What says your mod_jk status page ?
try to monitor during the load to see if your workers are in error or OK state, il the max busy is reached,....
Then look at your logs (mod_jk, apache, tomcat, webapps logs, windows,...)

As said before, you should check the number of tcp connections opened. If your do not use keep alive feature you can have a bootleneck there (apache and tomcat servers).You can also have error like max opened file reached.
Then look at the load average,system cpu, iowait,..

You can also monitor your tomcats through JMX (using jconsole or missioncontrol) to check that garbage collections works fine and just don't hang up too long.

try to deactivate the 2 tomcat instances on your apache server to see if httpd is still available after the load test.

--
Bj


On 9/14/07, James Sherwood <jsherwood@xxxxxxxxxxxxxxxx> wrote:
Hello,

Everything is Windows2003 Server.

After the load we cannot load pages either through apache or by contacting
tomcat directly.

I beleive you are on the right path tho, about connections not getting
released, thats what I figure it is too but I do not know how to fix it.

Thanks,
James


----- Original Message -----
From: "AFrieze" < AFrieze@xxxxxxxxxxxx >
To: <users@xxxxxxxxxxxxxxxx>
Sent: Friday, September 14, 2007 12:02 PM
Subject: Re: Load balancing question


>
>>
>> We also have the problem of once the load stops, the sites are still down
>> but Apache/tomcats still seem to be running fine.  A restart of
>> either(not even both) fixes the sites.
> A guess
>
>  Your apache server is not releasing connections.  If you are running
> linux, type  "netstat -vat" into a terminal on your apache machine, before
> and after you hit your server.  See if the connections are being released.
>
> You could also try typing "ps -e | grep "httpd"" to see how many apache
> processes are being run before/after.  Look in the apache error log, etc.
> You might find a clue like "MaxClients reached"
>
> Question
> Are you able to log into all your tomcats(through port 8080) independent
> of apache and get served requests?  Can you log onto apache and get a
> statically served page?
>
> Cheers
> AFrieze
>
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server Project.
> See <URL: http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
>   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
> For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx
>
>
> __________ NOD32 2529 (20070913) Information __________
>
> This message was checked by NOD32 antivirus system.
> http://www.eset.com
>
>


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL: http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx




__________ NOD32 2534 (20070917) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com


[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux