Re: CARP Failover behavior - multiple parents chosen for URL

Chris Woodfield <rekoil@xxxxxxxxxxxxx> · Wed, 6 May 2009 20:29:38 -0400

On May 6, 2009, at 8:14 PM, Amos Jeffries wrote:

Hi,

I've noticed a behavior in CARP failover (on 2.7) that I was  
wondering
if someone could explain.

In my test environment, I have a non-caching squid configured with
multiple CARP parent caches - two servers, three per box (listening  
on
ports 1080/1081/1082, respectively, for a total of six servers.

When I fail a squid instance and immediately afterwards run GETs to
URLs that were previously directed to that instance, I notice that  
the
request goes to a different squid, as expected, and I see the
following in the log for each request:

May  6 11:43:28 cdce-den002-001 squid[1557]: TCP connection to http-
cache-1c.den002 (http-cache-1c.den002:1082) failed

And I notice that the request is being forwarded to a different, but
consistent, parent.

After ten of the above requests, I see this:

May  6 11:43:41 cdce-den002-001.den002 squid[1557]: Detected DEAD
Parent: http-cache-1c.den002

So, I'm presuming that after ten failed requests, the peer is
considered DEAD. So far, so good.

The problem is this: During my test GETs, I noticed that immediately
after the "Detected DEAD Parent" message was generated, the parent
server that the request was being forwarded to changed - as if  
there's
an "interim" decision made until the peer is officially declared  
DEAD,
and then another hash decision made afterwards. So while consistent
afterwards, it's apparent that during the failover, the parent server
for the test URL changed twice, not once.

Can someone explain this behavior?

Do you have 'default' set on any of the parents?
It is entirely possible that multiple paths are selected as usable and
only the first taken.

No, my cache_peer config options are

cache_peer http-cache-1a.den002 parent 1080 0 carp http11 idle=10
<repeat for each hostname>

During the period between death and detection the dead peer will  
still be
attempted but failover happens to send the request to another  
location.
When death is detected the hashes are actual re-calculated.

OK, correct me if I misread, but my understanding of the spec is that  
each parent cache gets its own hash value, each of which is then  
combined with the URL's hash to come up with a set of values. The  
parent cache corresponding with the highest result is the cache  
chosen. If that peer is unavailable, the next-best peer is selected,  
then the next, etc etc.

If that is correct, what hashes are re-calculated when a dead peer is  
detected? Any why would those hashes result in different results than  
the pre-dead peer run of the algorithm

And more importantly, will that recalculation result in URLs being re- 
mapped that weren't originally pointed to the failed parent? I thought  
avoiding such an arbitrary re-mapping was the whole point of the CARP  
algorithm.

-C

If anyone wants a task it may be useful to see whether leaving dead  
peers
in the existing hash and omitting the dead peers at the selection time
instead of connection time is more responsive like this while  
reducing the
double-change.

Again, I'm not clear on the difference between the two - educate me  
please :)

Amos