Schermuly-Koch, Achim wrote:
Hi amos,
thanks for your advise so far. I am still not sure wich path to follow...
We are using squid as a reverse-proxy cache to speed up our website.
A large area of the website is public. But there is also a
personalized area. If a user logs into his personal site, we maintain
a session for the user (using standard tomcat features jsession-id
cookie with optional url-rewriting).
[...] the pages on the public area has a small caveat: If the user
was logged in the private area, we maintain the "logged-in" state and
reflect that state on public pages also (outputting "Welcome John
Doe" in a small box). Of course we must not cache these pages.
# Recognizes mysite acl MYSITE url_regex ^http://[^.]*\.mysite\.de
# Don't cache pages, if user sends or gets a cookie
acl JSESSIONID1 req_header Cookie -i jsessionid
cache deny MYSITE JSESSIONID1
acl JSESSIONID2 rep_header Set-Cookie -i jsessionid
cache deny MYSITE JSESSIONID2
This seemed to wor fine. Until i did a jmeter test, mixing Requests
with and without sessionid cookies. Is seems that if i request an
already cached url with a session-cookie, that the cached document is
flushed.
[...]
Of course if Squid find that it has a cached copy it will erase. Because
the _UR_ is not to be cached. Content is not considered.
This is NOT the right way to do privacy caching. See below for why and
how to do it.
[...]
The biggest surprise of all is still hiding unseen by you:
Every other cache around the Internet visitors use maybe storing the
private area pages!!
This is because you use a local configuration completely internal to
your Squid to determine what is cacheable and what is not.
The correct way to do this is to:
* have the web server which generates the pages add a header
("Cache-Control: private") to all pages which are in the private area of
the website. This tells every shared cache (your Squid included) not to
store the private info.
I agree with that. Do i have to configure the reverse-proxy *explicitely*
to avoid caching "Cache-Control: private" marked pages?
No, the proxy will avoid caching them by default.
A problem i foresee with that solution is, if i set "Cache-Control:
private" for pages containing personalized content, they will bounce
cached pages with the same URL - but without personalized content
(rember: the page is rendered different, depending on wether the
user is in a session.)
Yes, this is a problem in some versions of Squid. Proper ETag supporting
Squid will/do not have this problem. Though Squid-2 series handle ETag
better than Squid-3 at present.
* have the personal adjustments to the public pages done as small
includes so that the main body and content of the page can be cached
normally, but the small modifications are not.
For example I like including a small CSS/AJAX script which changes a
generic HTML div [..]
I have thought of that, too. But i would prefer not to touch
the application.
Okay then you are stuck with the CC:private and ETag to work with.
The HTTP way to achieve similar is to add "ETag:" header with some hash
of the page content in it. So each unique copy of the page is stored
separately. The personalized pages get "Cache-Control: private" added as
well so that whole request get discarded.
That sounds interesting... Are the following assumptions correct:
The ETag would be generated by the webserver. A public page (/index.jsp)
would have _one_ ETag if rendered without and a different unique ETag for
each request (to the same /index.jsp) with a session-cookie. The cache
for the publicly cached page would be left untouched, if the response
bears a "Cache-Control: private" header but with a different ETag. That
implies, the cache is flushed when the webserver responds, not when the
client requests.
Does the Etag have to be unique resource-wide, or is it also possible
to use the same ETag for different resources (since they have
different URLs)?
Is it another "very bad idea (tm)" to reuse the same ETag for each
personalized page. I would assume, it doesn't matter since they are
marked "private" anyway?
Theoretically you are right, it _should_ not matter. However in practice
the proxies when seeing 'private' may discard all copies of objects at
the URL. Squid uses its limited ETag support to get around that issue.
So the ETag which are marked private always get discarded even is
previously marked public, but the others not discarded.
ETag is meant to identify a unique copy of each object at a URL. The
compressed vs non-compressed version and the personalized vs
non-personalized versions.
Amos
--
Please be using
Current Stable Squid 2.7.STABLE6 or 3.0.STABLE18
Current Beta Squid 3.1.0.13