Search squid archive

Re: squid cache prob: won't cache a 'pdf'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 06 Apr 2011 12:34:26 -0700, Linda Walsh wrote:
I was downloading some product documentation from the
documentation section on:

http://www.lsi.com/channel/products/jbods/sata_sas_jbods/630j/index.html

Specifically, I tried:

http://www.lsi.com/DistributionSystem/User/AssetMgr.aspx?asset=54432
http://www.lsi.com/DistributionSystem/User/AssetMgr.aspx?asset=54841
http://www.lsi.com/DistributionSystem/User/AssetMgr.aspx?asset=54435

They all load smallish pdf's:
(from log monitor:)
   +63.50  346ms; ln=473  (1.3K/7.4) TCP_MISS/200 <Athenae2 [HEAD
http://www.lsi.com/DistributionSystem/User/AssetMgr.aspx?asset=54841 -
HIER_DIRECT/www.lsi.com application/pdf ]
   +7.01   220ms; ln=462  (2.1K/65.9) TCP_MISS/200 <Athenae2 [HEAD
http://www.lsi.com/DistributionSystem/User/AssetMgr.aspx?asset=54435 -
HIER_DIRECT/www.lsi.com application/pdf ]
   +6.21  23914ms; ln=5051477(206.3K/795.4K) TCP_MISS/200 <Athenae2
[GET
http://www.lsi.com/DistributionSystem/User/AssetMgr.aspx?asset=54432 -
HIER_DIRECT/www.lsi.com application/pdf ]

----

The first two requests here are HTTP "HEAD"[er] requests, which do not actually retrieve any of the body to be cached but could cache the headers that come back. The third is a GET which might be cached, but will be a MISS if the existing cache only has HEAD[er] details.



Now I've tried several mods in my squid.conf file (how do you
squid to display it's version?  I tried --version, but
no go) -- am running something like Squid 3.2.0.4 (at least
it's the last entry in the 'Changelog' on disk; it signs on
as "Head-BZR").

Thanks, that is about as good as you are likely to get. ('3.2.0.4 plus some patches').

FYI: "squid -v" for the version and build info. But in your case that would show the same 3.HEAD-BZR for version.

NP: It gets a bit difficult to track in the HEAD code. The daily snapshots have a 3.HEAD-$date, but anything more live than that requires --build-info parameter added with something to identify it.



Things I have tried:
1) commenting out:
   'acl QUERY urlpath_regex cgi-bin \?'
   'cache deny QUERY'

Good. Using that would absolutely prevent caching those requests. Regardless of any other problems.

This change has made the cacheability go from NO to MAYBE. Other factors (like the HEAD/GET difference) will still make the MAYBE go to a definite decision.

2) adding back:
   'acl QUERY urlpath_regex cgi-bin \?'
   'cache allow QUERY'    ## Note changed it to 'allow'

Should not have any effect.

3) commenting out:
   'hierarchy_stoplist cgi-bin ?'
  Note -- didn't think I needed this, as I had no other
caches I was querying from, but a comment further on down
under 'nonhierarchical_direct', said,

  "By default, squid will send any non-hierarchical
   requests (matching hierarchy_stoplist or not cachable
   request type) direct to origin servers.  If you
   set this to off, Squid will prefer to send these request
   to parents."

I took the comment to indicate that if something was in the
hierarchy_stoplist, it would also prevent caching, thus my try
in disabling it

These only come into affect if fetching from a peer. Removing hierarchy_stoplist will allow matching peer-sourced replies to maybe be cached here and maybe in the peer.

4) In my refresh patterns, I have entries for ftp and gopher
and one for ".": (which presumably would match everything else):

   refresh_pattern .   0 20%   4320

To that line I have tried adding a bunch of keywords
(note, it's all 1 line in the squid.conf file, no backslashes):

   refresh_pattern .   0 20%   4320    ignore-no-store \
   ignore-no-cache ignore-private ignore-auth override-expire \
   reload-into-ims

The only ones I haven't tried yet are 'refresh-ims',
'override-expire' and 'override-lastmod', but those shouldn't
be needed and might cause more headaches than it is worth.

Is there something I'm missing?  This seems like it should be
'simple'.

You need to look at the actual PDF request and reply headers. That will tell you what is going on and which (if any) of the overrides are useful.

Your log below has those Cache-Control details.


Relevant log file entries are below (access, cache, store...)



The full entry (from access.log) from one of the above shows:
------------------------------------------------------------
1302116600.765    108 192.168.3.140 TCP_MISS/200 468 HEAD
http://www.lsi.com/DistributionSystem/User/AssetMgr.aspx?asset=54432 -
HIER_DIRECT/www.lsi.com application/pdf [Host:
www.lsi.com\r\nUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0;
en-US; rv:1.9.2.16) Gecko/20110319 Firefox/3.6.16\r\nAccept:

text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8,application/json\r\nAccept-Language:
en,en-us;q=0.5\r\nAccept-Encoding: gzip,deflate\r\nAccept-Charset:
UTF-8,*\r\nKeep-Alive: 1800\r\nProxy-Connection: keep-alive\r\n]

Okay. No reason not to cache.

[HTTP/1.1 200 OK\r\nDate: Wed, 06 Apr 2011 19:03:16 GMT\r\nServer:
Microsoft-IIS/6.0\r\nX-Powered-By: ASP.NET\r\nX-AspNet-Version:
2.0.50727\r\nContent-Disposition: attachment;
filename=JBOD_Enclosures_Guide_080310.pdf\r\nSet-Cookie:
ASP.NET_SessionId=vgzglkahj1njarzzn4yooun3; path=/;
HttpOnly\r\nCache-Control: private\r\nContent-Type:
application/pdf\r\nContent-Length: 5051083\r\n\r]

Marked explicitly as "private" - aka cannot be cached by any middleware proxy (such as Squid) which may send it to other users. May be cached by a personal cache such as the browser storage.


To me it looks like incorrect website Cache-Control:. Although if you require a login to fetch that doc, then it could be right.

<snip other logs>

Amos



[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux