mod_proxy_balance never recovers from a worker error with stickysession

"Dale Ogilvie" <Dale.Ogilvie@xxxxxxxxxxxxx> · Thu, 31 May 2007 14:14:50 +1200

Hello,

I am running Apache 2.2.3 on RedHat EL 5. I am trying to use Apache to
load balance between two local instances of tomcat in order to utilize
the vast quantities of RAM on our production server.

My httpd setup looks like this:

<Proxy balancer://tomcat>
    BalancerMember ajp://localhost:8009 min=10 max=100 route=tomcat1
loadfactor=1 retry=120
    BalancerMember ajp://localhost:8010 min=10 max=100 route=tomcat2
loadfactor=1 retry=120
</Proxy>

<Location /balancer-manager>
    SetHandler balancer-manager
    Order deny,allow
    Deny from all
    Allow from .trimblecorp.net
</Location>

ProxyPass /dscgi/ds.py/ balancer://tomcat/docushare/dsweb/
stickysession=JSESSIONID nofailover=On
ProxyPass /docushare balancer://tomcat/docushare
stickysession=JSESSIONID nofailover=On
ProxyPass /docushare/ balancer://tomcat/docushare/
stickysession=JSESSIONID nofailover=On

The problem is that if one of the workers gets into error status, any
client with a JSESSIONID referencing that route is never able to receive
a reply, Apache *always* responds with a 503 - Temporarily unavailable,
*until* another request is successful. I expected with "retry=120" that
after 120 seconds the client would be able to use the errored out
worker, but this is *not* the case.

Test case:

1. Start tomcats
2. Access /docushare, this succeeds and returns a JSESSIONID cookie
referencing the member e.g.
JSESSIONID=BC90C156669FDF0194657FF27EC3AF99.tomcat2
3. Stop tomcats to simulate a backend failure
4. Access /docushare again in the same browser session, this fails with
a 503 error (as expected). Balance-manager shows tomcat1 is OK, and
tomcat2 is Err
Error_log shows: All workers are in error state for route (tomcat2)
5. Start tomcats again
6. Wait for 120+ seconds to allow retry=120 to take effect
7. Access /docushare *using the session with the tomcat2 cookie*, expect
success, get 503 error. I can repeat this step ad nauseam without ever
getting a successful response.
Error_log shows: All workers are in error state for route (tomcat2)
8. To resolve the issue, delete the JSESSIONID cookie from the client or
open up a new browser and access /docushare. Either of these seem to
solve the problem for the "cookied" browser session.
9. Access /docushare, this succeeds, balance-manager shows both tomcat1
and tomcat2 are now OK even though the cookie returned to this request
is for *tomcat1*.

So I would expect that the balance would retry the errored path
successfully "retry" seconds after the failure. Is this a bug or do I
have some misunderstanding and/or misconfiguration?

Regards

--
Dale Ogilvie
Senior Software Engineer
Trimble Navigation NZ Ltd
P O Box 8729
Riccarton
Christchurch
Ph:       +64 3 9635344
Fax:     +64 3 9635317

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx