Have you considered setting squid up to know about both origins, so it
can fail over automatically?
On 26/09/2008, at 5:04 AM, Dave Dykstra wrote:
I am running squid on over a thousand computers that are filtering
data
coming out of one of the particle collision detectors on the Large
Hadron Collider. There are two origin servers, and the application
layer is designed to try the second server if the local squid
returns a
5xx HTTP code (server error). I just recently found that before squid
2.7 this could never happen because squid would just return stale data
if the origin server was down (more precisely, I've been testing with
the server up but the listener process down so it gets 'connection
refused'). In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if
the origin server sends 'Cache-Control: must-revalidate' then squid
will
send a 504 Gateway Timeout error. Unfortunately, this timeout error
does not get cached, and it gets sent upstream every time no matter
what
negative_ttl is set to. These squids are configured in a hierarchy
where each feeds 4 others so loading gets spread out, but the fact
that
the error is not cached at all means that if the primary origin server
is down, the squids near the top of the hierarchy will get hammered
with
hundreds of requests for the server that's down before every request
that succeeds from the second server.
Any suggestions? Is the fact that negative_ttl doesn't work with
max_stale a bug, a missing feature, or an unfortunate interpretation
of
the HTTP 1.1 spec?
By the way, I had hoped that 'Cache-Control: max-stale=0' would work
the
same as squid.conf's 'max_stale 0' but I never see an error come back
when the origin server is down; it returns stale data instead. I
wonder
if that's intentional, a bug, or a missing feature. I also note that
the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is
stale) header attached if stale data is returned and I'm not seeing
those.
- Dave
--
Mark Nottingham mnot@xxxxxxxxxxxxx