On Tue, 6 Feb 2007, Henrik Nordstrom wrote:
tis 2007-02-06 klockan 01:00 +0000 skrev John Line:
Investigation showed that the problem was that the new Squid version was
caching the temporary redirects (HTTP status 302) sent by origin servers
to direct unauthenticated requests to our authentication server. When the
authentication server subsequently redirected the (now authenticated)
requests back to the originally-requested URLs, Squid served the
corresponding cached redirects instead of passing the requests through to
the origin servers.
Odd. Should not happen.
[snip]
I haven't yet filed a bug report, as further investigation showed the
situation was more complex than it seemed at first. I'd made the mistake
of taking at face value the response header "X-Cache: HIT from
omicron.wwwcache.cam.ac.uk"; Squid *was* attempting to re-validate the
cached redirect, but mistakenly concluding that it was still valid.
I'm not sure whether that is Squid's fault, or a problem with the
behaviour of the origin server and its authentication component. I'll try
describing the significant details of what I believe is actually
happening, in the hope that someone with a better understanding of the
HTTP RFC's rules about caching and re-validation will be able to say with
more certainty where the problem lies.
[This is based on Squid logging, Firefox "Live HTTP Headers" extension
reporting the browser's view of events, and ethereal watching the HTTP
dialogue between Squid and the origin server).]
The significant events are:
* browser requests a URL for which authentication is needed (and it
doesn't have a cookie identifying it as already authenticated for that
origin server); Squid passes the request to the origin server, which
sends a 302 redirect, directing the browser to our authentication
server.
* Squid caches that redirect, as it has an Expires: header (though it is
"pre-expired" - Expires: timestamp identical to Date: timestamp - so
(by the rules described in the HTTP RFC) revalidation is needed before
the cached redirect can be served in response to subsequent requests..
* after interaction with the authentication server, the user's browser
ends up requesting the same URL that it originally requested, but it
is now authenticated (via a cookie).
* Squid apparently sees that the cached 302 redirect response for that
URL is expired (always has been been, the Expires: header was meant
to ensure even HTTP 1.0 caches wouldn't cache the redirect...).
It's the next bit of the interaction that leaves me uncertain where the
fault lies.
* Squid sends a request for the URL to the origin server, passing through
the authentication cookie and adding an "If-Modified-Since" header
quoting the timestamp from the Date: or Expires: header (can't tell
which, they are identical) of the cached redirect. Is it allowed to do
that? The redirect does not have a Last-Modified: header, so it must be
using one of the others. The request from the browser did NOT have an
If-Modified-Since header, so Squid must have added it.
* Because the user is now authenticated, the origin server does NOT send
a redirect. Instead, it sends a 304 Not Modified response because the
requested document is a static HTML page that was last modified a long
time (days, months, or years) before the timestamp quoted in the
If-Modified-Since header.
* Squid sends the cached redirect to the browser and logs the request as
TCP_REFRESH_HIT/302. The user is stuck, since re-confirming the
authentication just results in being sent back to Squid/the origin
server and being sent another copy of the redirect.
* Note that using squidclient to purge the cached redirect fixes the
problem temporarily, allowing the already-authenticated browser to
access the requested document and any other that's allowed by the user's
credentials, but only until the authentication cookie is invalidated
(e.g. by restarting the browser), after which the problem recurs (for
URLs that were being browsed without difficulty until re-authentication
was required).
That leaves me uncertain about where the blame lies.
It seems odd - and maybe wrong - that Squid is using If-Modified-Since
with a timestamp derived from the Date: or Expires: header - surely it
should only use a timestamp from a Last-Modified: header? Since the cached
redirect does NOT have a Last-Modified: header, that should mean that
Squid cannot attempt revalidation and should discard the cached redirect
and simply send a normal request to the origin server. That would
certainly have the right outcome (getting a copy of the requested
document, as long as the user was authenticated).
RFC 2616 says (in section "13.3 Validation Model")
Note: a response that lacks a validator may still be cached, and
served from cache until it expires, unless this is explicitly
prohibited by a cache-control directive. However, a cache cannot
do a conditional retrieval if it does not have a validator for the
entity, which means it will not be refreshable after it expires.
That reads like it is agreeing with me, unless it is legitimate for Squid
to use the timestamp from another header (must be Date: or Expires:) with
If-Modified-Since.
Section 13.3.5 of the RFC says (in part) "Thus, comparisons of any other
headers (except Last-Modified, for compatibility with HTTP/1.0) are never
used for purposes of validating a cache entry.", which appears to confirm
that Last-Modified: is the only header which can be used in such
comparisons.
Is the problem actually Squid's fault? Or is it the origin server's fault
for responding with details about the HTML document when Squid was asking
about the redirect?
I don't see how the origin server could avoid doing that, though, since
the request from Squid does not and cannot distinguish those two cases
(i.e. it cannot ask "is this redirect what you would currently send for
this request?"). Although some requests include an authentication cookie
and others do not, and different outcomes are expected, Squid cannot be
expected to know the significance of the cookie to the origin server.
I'll file a bug report if responses seem to favour it being Squid's fault!
John
--
John Line - web & news development, University of Cambridge Computing Service