Re: Can Squid cache literally all HTTP responses for testing?

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Sat, 17 Nov 2012 10:43:32 +1300

On 17/11/2012 9:54 a.m., Kevin Nardi wrote:
Hi squid-users,

I'm an experienced web developer who is using Squid for the first
time. For internal testing, we need a stable cache of a certain list
of sites (which we do not own) that we use in our test. I know Squid
isn't built to do this, but I thought for sure it would be possible to
configure it to cache literally all HTTP responses and then use those
for all requests.

Squid caches objects both literally (full object and meta data) and 
temporaly (time-oriented). HTTP itself is stateless and contains both 
entity variation and a high temporality for those representation 
variations. Which means each object in cache is just one instance from a 
*set* of response objects which are *all* represented by the one URL.

If you use a proxy cache like Squid as the data source for this type of 
testing you will get false test results.

You need a web service setup to present the expected answer for each of 
your requests. For testing Squid we use Co-Advisor or HTTP compliance 
testing, and custom server scripts to respond with fixed output on 
certain requests. Polygraph is also in the mix there sometimes for 
throwing traffic load like you want through the system - but is more 
oriented at testing server systems than client ones AFAIK.

  Here is my very simple Squid 3.1 config that is
intended to do that:

===================================================
offline_mode on

refresh_pattern . 525600 100% 525600 override-expire override-lastmod
ignore-reload ignore-no-cache ignore-no-store ignore-must-revalidate
ignore-private ignore-auth
vary_ignore_expire on
minimum_expiry_time 99 years
minimum_object_size 0 bytes
maximum_object_size 1024 MB

cache_effective_user _myusername

http_access allow all

coredump_dir /usr/local/squid/var/logs

strip_query_terms off

url_rewrite_access allow all
url_rewrite_program /usr/local/squid/similar_url_rewrite.rb
url_rewrite_concurrency 10
url_rewrite_children 10

cache_dir ufs /usr/local/squid/caches/gm 5000 16 256
http_port 8082
pid_filename /usr/local/squid/var/run/gm.pid

access_log /usr/local/squid/var/logs/access-gm.log
cache_log /usr/local/squid/var/logs/cache-gm.log
===================================================

As you can see, I am intelligently rewriting URLs to always match URLs
that I know should be in the cache because I've hit them before. I
find that my hit rate is still only about 56%, and that is mostly 304
IMS hits.

URL != object.

Also, Squid is only rated to 50% HIT rate for forward-proxy HTTP traffic 
- often a lot less. Getting above that is a rather good outcome (when 
ignoring response accuracy).

  I have been unable to find sufficient documentation or debug
logging to explain why Squid would still not cache some requests.

HTTP uses *a lot* more than URL to determine the suitable response 
representation. All of those headers which you use refresh_pattern to 
ignore are how Squid identifies object X from object Y when both are at 
the same URL.
 That ignore-no-store and ignore-private are particularly dangerous for 
you since it is an explicit *removal* of permission for those responses 
to be stored even temporarily to disk. They are private and clearly 
marked as such by the owner - storing them is actually illegal in most 
of the world.

What are you testing? (the client software)

And what are the site profiles for your test sites?
  (static / dynamic content proportions? personalization amount and 
types? Vary: header? highly variable dynamic content? at what update 
rate/frequency? is the server performing refresh properly (304 versus 
useless 200 responses)?)

Amos