On Wed, 06 Apr 2011 12:34:26 -0700, Linda Walsh wrote:
I was downloading some product documentation from the
documentation section on:
http://www.lsi.com/channel/products/jbods/sata_sas_jbods/630j/index.html
Specifically, I tried:
http://www.lsi.com/DistributionSystem/User/AssetMgr.aspx?asset=54432
http://www.lsi.com/DistributionSystem/User/AssetMgr.aspx?asset=54841
http://www.lsi.com/DistributionSystem/User/AssetMgr.aspx?asset=54435
They all load smallish pdf's:
(from log monitor:)
+63.50 346ms; ln=473 (1.3K/7.4) TCP_MISS/200 <Athenae2 [HEAD
http://www.lsi.com/DistributionSystem/User/AssetMgr.aspx?asset=54841
-
HIER_DIRECT/www.lsi.com application/pdf ]
+7.01 220ms; ln=462 (2.1K/65.9) TCP_MISS/200 <Athenae2 [HEAD
http://www.lsi.com/DistributionSystem/User/AssetMgr.aspx?asset=54435
-
HIER_DIRECT/www.lsi.com application/pdf ]
+6.21 23914ms; ln=5051477(206.3K/795.4K) TCP_MISS/200 <Athenae2
[GET
http://www.lsi.com/DistributionSystem/User/AssetMgr.aspx?asset=54432
-
HIER_DIRECT/www.lsi.com application/pdf ]
----
The first two requests here are HTTP "HEAD"[er] requests, which do not
actually retrieve any of the body to be cached but could cache the
headers that come back. The third is a GET which might be cached, but
will be a MISS if the existing cache only has HEAD[er] details.
Now I've tried several mods in my squid.conf file (how do you
squid to display it's version? I tried --version, but
no go) -- am running something like Squid 3.2.0.4 (at least
it's the last entry in the 'Changelog' on disk; it signs on
as "Head-BZR").
Thanks, that is about as good as you are likely to get. ('3.2.0.4 plus
some patches').
FYI: "squid -v" for the version and build info. But in your case that
would show the same 3.HEAD-BZR for version.
NP: It gets a bit difficult to track in the HEAD code. The daily
snapshots have a 3.HEAD-$date, but anything more live than that requires
--build-info parameter added with something to identify it.
Things I have tried:
1) commenting out:
'acl QUERY urlpath_regex cgi-bin \?'
'cache deny QUERY'
Good. Using that would absolutely prevent caching those requests.
Regardless of any other problems.
This change has made the cacheability go from NO to MAYBE. Other
factors (like the HEAD/GET difference) will still make the MAYBE go to a
definite decision.
2) adding back:
'acl QUERY urlpath_regex cgi-bin \?'
'cache allow QUERY' ## Note changed it to 'allow'
Should not have any effect.
3) commenting out:
'hierarchy_stoplist cgi-bin ?'
Note -- didn't think I needed this, as I had no other
caches I was querying from, but a comment further on down
under 'nonhierarchical_direct', said,
"By default, squid will send any non-hierarchical
requests (matching hierarchy_stoplist or not cachable
request type) direct to origin servers. If you
set this to off, Squid will prefer to send these request
to parents."
I took the comment to indicate that if something was in the
hierarchy_stoplist, it would also prevent caching, thus my try
in disabling it
These only come into affect if fetching from a peer. Removing
hierarchy_stoplist will allow matching peer-sourced replies to maybe be
cached here and maybe in the peer.
4) In my refresh patterns, I have entries for ftp and gopher
and one for ".": (which presumably would match everything else):
refresh_pattern . 0 20% 4320
To that line I have tried adding a bunch of keywords
(note, it's all 1 line in the squid.conf file, no backslashes):
refresh_pattern . 0 20% 4320 ignore-no-store \
ignore-no-cache ignore-private ignore-auth override-expire \
reload-into-ims
The only ones I haven't tried yet are 'refresh-ims',
'override-expire' and 'override-lastmod', but those shouldn't
be needed and might cause more headaches than it is worth.
Is there something I'm missing? This seems like it should be
'simple'.
You need to look at the actual PDF request and reply headers. That will
tell you what is going on and which (if any) of the overrides are
useful.
Your log below has those Cache-Control details.
Relevant log file entries are below (access, cache, store...)
The full entry (from access.log) from one of the above shows:
------------------------------------------------------------
1302116600.765 108 192.168.3.140 TCP_MISS/200 468 HEAD
http://www.lsi.com/DistributionSystem/User/AssetMgr.aspx?asset=54432
-
HIER_DIRECT/www.lsi.com application/pdf [Host:
www.lsi.com\r\nUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0;
en-US; rv:1.9.2.16) Gecko/20110319 Firefox/3.6.16\r\nAccept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8,application/json\r\nAccept-Language:
en,en-us;q=0.5\r\nAccept-Encoding: gzip,deflate\r\nAccept-Charset:
UTF-8,*\r\nKeep-Alive: 1800\r\nProxy-Connection: keep-alive\r\n]
Okay. No reason not to cache.
[HTTP/1.1 200 OK\r\nDate: Wed, 06 Apr 2011 19:03:16 GMT\r\nServer:
Microsoft-IIS/6.0\r\nX-Powered-By: ASP.NET\r\nX-AspNet-Version:
2.0.50727\r\nContent-Disposition: attachment;
filename=JBOD_Enclosures_Guide_080310.pdf\r\nSet-Cookie:
ASP.NET_SessionId=vgzglkahj1njarzzn4yooun3; path=/;
HttpOnly\r\nCache-Control: private\r\nContent-Type:
application/pdf\r\nContent-Length: 5051083\r\n\r]
Marked explicitly as "private" - aka cannot be cached by any middleware
proxy (such as Squid) which may send it to other users. May be cached by
a personal cache such as the browser storage.
To me it looks like incorrect website Cache-Control:. Although if you
require a login to fetch that doc, then it could be right.
<snip other logs>
Amos