Search squid archive

Re: refresh pattern questions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15/07/2013 6:31 a.m., Joshua B. wrote:
I have some questions related to refresh pattern options

First, since “no-cache” now seems in-effective with http 1.1, what would be a possible way to force an object to cache using both standards of html 1.0 and 1.1? If it’s not possible, then is there any plans to implement in a future version of squid?

You are talking about "ignore-no-cache"? I'm not sure you understand exactly what it did and what the new Squid do instead.

Simply put:
There is *no* HTTP/1.0 equivalent for "no-cache" on responses. The best one can do is set an Expires header.

Squid-2.6 to 3.1 had some small HTTP/1.1 support but were unable to perform the tricky revalidation required for handling "no-cache" responses properly so they used to treat no-cache as if it were "no-store" and prevent caching of those responses.

==> "ignore-no-cache" used to flip that behaviour and cause them to be stored. This resulted in a great many objects being cached for long periods and re-sent to clients from cached copies which were outdated and might cause big UX problems (thus the warnin when it was used).


Squid-3.2 and later have far better HTTP/1.1 support *including* the ability to revalidate "no-cache" responses properly. So these versions of Squid *do* store the responses with "no-cache" by default. They then send an IMS request to the server to verify the HIT is up-to-date - resolving all those UX problems. ==> the useful effect of "ignore-no-cache" does not need any config option now, and the bad side-effects ... do you really want them?


** If you have a server and want to follow the "old" behaviour of no-cache responses. You should already have been using "no-store" instead.

** If you have a server and want to follow the "old" behaviour when "ignore-no-cache" was used. You should not have been sending "no-cache" on responses to begin with.


Secondly, why is there a limit of 1 year on an “override” method? A lot of websites make it such a pain to cache, and even go as far as (literally) setting the date of their files back to the early 1900s. Them doing this makes it feel impossible to cache the object, especially with squids own limitation.

To prevent 32-bit overflow on the numerics inside Squid. Going much further out the number inverts and you end up with objects being evicted from cache instead of stored. The whole refresh_pattern calculations need to be 64-bit upgraded and the override-* and ignore-* options reviewed as to what they do versus what the 1.1 spec allows to happen by default (like no-cache just got done).

You ever wonder why those websites go to such extreme lengths? Why they care so much about their client getting recently updated content?

With all this said, IS there an effective way to cache content when the server doesn’t want you to? So there would be like, a GAURANTEED “tcp_hit” in the log. Even with a ? in the url of the image, so squid would consider anything with a ? after it the same image. For example: website.com/image.jpg?1234567890
It's the exact same image (I've examined all in the logs that look like this), but they're making it hard to cache with the ? in the url, so I'd like to know if there's a way around this?

1) Remove any squid.conf "QUERY" ACL and related "cache deny" settings which Squid-2.6 and earlier required. That includes the hierarchy_stoplist patterns. These are the usual cause of dynamic content not caching in Squid-2.7+.

2) Try out the upcoming 3.4 (3.HEAD right now) Store-ID feature for de-duplicating cache content. You can also in older versions re-write the URL to strip the numerics. In some ways this is safer as the backend then becomes aware of the alteration and smart ones can take special action to prevent any massive problems if you accidentally collide with a security system (see below).

How do you know that "website.com/image.jpg?1234567890" is not ...
 ... a part of some captcha-style security system?
 ... the background image for a login button which contains the users name?
 ... an image-written bank account number?
 ... an image containing some other private details?
 ... a script with dynamic references to other URLs?

To be sure that you don't make that type of mistake with all the many, many ways of using URLs you have to audit *every single link on every single website which your regex pattern matches*, .. or do the easy thing and let HTTP caching controls work as they are supposed to work. Send and annoyed email to the site in question requesting that they fix their URL scheme, highlighting that they get *free* bandwidth in exchange for the fix. Sites do change - Facebook is a good case study to point at: as they scaled up they had to fix their cacheability and HTTP/1.1 compliance to help save costs exploding.

Amos





[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux