Search squid archive

Re: refresh_pattern dynamic content doubts?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20/05/2012 4:52 p.m., Beto Moreno wrote:
Hi.

I have read in the doc that squid default setup is using the old way
to handle dynamic content:

case A
hierarchy_stoplist cgi-bin ?
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY

And for the new way for this is using the next settings:
case B
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern .            0 20% 4320

Some sites I had seen they use things like:
case C
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern -i \.index.(html|htm)$ 1440 90% 40320
refresh_pattern -i \.(html|htm|css|js)$ 1440 90% 40320
refresh_pattern .            0 20% 4320

the old way in your experience is no longer the right way for this?

There is no right/wrong here.

HTTP/1.0 specification is clear that dynamic content created by CGI scripts is *very likely* unsafe to cache *unless* the script emits Cache-Control headers.

The "old" way was to simply not cache anything which came from a dynamic script generator.

The refresh_pattern rules are only used for the objects which have no cache-control (ie the unsafe requests) and "-i (/cgi-bin/|\?) 0 0% 0" is a heuristic rule crafted specifically to match the "dynamic content" criteria and prevent that unsafe content being cached.

The new way permits caching whenever the dynamic responses created by modern script languages send cache-controls. All the modern dynamic websites are cacheable (their script engines emit cache-control) and using "?, so the old way would prevent caching. Leaving ISP with <20% cache HIT ratios. Moving to the new rule gains a few % in HIT ratio without much risk.

What is the different between case B and case C?
which is better?

There is no "better". Everything in refresh_pattern is relative to the specific traffic pattern going through a specific proxy.

You can tune it perfectly for todays traffic, and a new website becomes popular tomorrow that uses entirely different patterns. Or the popular website you are trying to cache changes their headers.


for dynamic content is the only settings we have?(I don't care about
youtube or streaming).

The thing to understand is that to squid there is no distinction between "dynamic" and "static" content. It is all just content. *individual* objects have headers (or not) which indicate its *individual* cacheability.

"refresh_pattern" directive is a blunt-object regex pattern applied universally to all requests to estimate cacheability time for objects which have no specific mention of lifetime sent by the server. "cache" directive is a sledge hammer to prevent caching or particular ACL matching requests.



exist a formula to setup min/max percent?

No. They are the *input* values to a formula for calculating expiry time. They are how long *you want* to store any object which matches the regex.


Amos


[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux