Search squid archive

Re: Can Squid cache literally all HTTP responses for testing?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17/11/2012 9:54 a.m., Kevin Nardi wrote:
Hi squid-users,

I'm an experienced web developer who is using Squid for the first
time. For internal testing, we need a stable cache of a certain list
of sites (which we do not own) that we use in our test. I know Squid
isn't built to do this, but I thought for sure it would be possible to
configure it to cache literally all HTTP responses and then use those
for all requests.

Squid caches objects both literally (full object and meta data) and temporaly (time-oriented). HTTP itself is stateless and contains both entity variation and a high temporality for those representation variations. Which means each object in cache is just one instance from a *set* of response objects which are *all* represented by the one URL.

If you use a proxy cache like Squid as the data source for this type of testing you will get false test results.

You need a web service setup to present the expected answer for each of your requests. For testing Squid we use Co-Advisor or HTTP compliance testing, and custom server scripts to respond with fixed output on certain requests. Polygraph is also in the mix there sometimes for throwing traffic load like you want through the system - but is more oriented at testing server systems than client ones AFAIK.


  Here is my very simple Squid 3.1 config that is
intended to do that:


===================================================
offline_mode on

refresh_pattern . 525600 100% 525600 override-expire override-lastmod
ignore-reload ignore-no-cache ignore-no-store ignore-must-revalidate
ignore-private ignore-auth
vary_ignore_expire on
minimum_expiry_time 99 years
minimum_object_size 0 bytes
maximum_object_size 1024 MB

cache_effective_user _myusername

http_access allow all

coredump_dir /usr/local/squid/var/logs

strip_query_terms off

url_rewrite_access allow all
url_rewrite_program /usr/local/squid/similar_url_rewrite.rb
url_rewrite_concurrency 10
url_rewrite_children 10

cache_dir ufs /usr/local/squid/caches/gm 5000 16 256
http_port 8082
pid_filename /usr/local/squid/var/run/gm.pid

access_log /usr/local/squid/var/logs/access-gm.log
cache_log /usr/local/squid/var/logs/cache-gm.log
===================================================


As you can see, I am intelligently rewriting URLs to always match URLs
that I know should be in the cache because I've hit them before. I
find that my hit rate is still only about 56%, and that is mostly 304
IMS hits.

URL != object.

Also, Squid is only rated to 50% HIT rate for forward-proxy HTTP traffic - often a lot less. Getting above that is a rather good outcome (when ignoring response accuracy).

  I have been unable to find sufficient documentation or debug
logging to explain why Squid would still not cache some requests.

HTTP uses *a lot* more than URL to determine the suitable response representation. All of those headers which you use refresh_pattern to ignore are how Squid identifies object X from object Y when both are at the same URL. That ignore-no-store and ignore-private are particularly dangerous for you since it is an explicit *removal* of permission for those responses to be stored even temporarily to disk. They are private and clearly marked as such by the owner - storing them is actually illegal in most of the world.

What are you testing? (the client software)

And what are the site profiles for your test sites?
(static / dynamic content proportions? personalization amount and types? Vary: header? highly variable dynamic content? at what update rate/frequency? is the server performing refresh properly (304 versus useless 200 responses)?)

Amos


[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux