Re: index key generation mechanism?

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Tue, 23 Aug 2011 00:40:41 +1200

On 22/08/11 18:28, Raymond Wang wrote:
hi all:

For the file of "somejs.js", there are two urls referring to it,  for
example: url1 is "http://www.a.com/somejs.js"; and url2 is
"http://www.a.com/somejs.js";.

by default, squid would use the above urls to generate certain key
that would be used as index key to write/read the somejs.js file
to/from memory.

My question is that: could I affect the index key generation, so that
squid could save the somejs.js file in memory only one object.  for
example, for url1, we can trim "http://www.a.com/somejs.js"; to
"somejs.js" and "http://www.a.com/somejs.js";  to "somejs.js", then the
key would be "somejs.js", e.g. use the file name (or some variants
based on it) as index key. this way, we can "save" the two different
urls (the referred files have the same content)  as only one object in
Squid.

is it possible?

Possible. Yes. Easy no.

The key Squid uses is the public URL which the client is asking for.

YouTube is a well-known website which behaves like you describe. It is a 
serious nightmare for a great many network admins.
http://wiki.squid-cache.org/ConfigExamples/DynamicContent/YouTube

The storeurl_rewrite feature experiment in squid-2.7 does exactly what 
you describe.

Considering that 1) you must already have a list of patterns for 
matching, and 2) you thus have a known location of at least one instance.

The safer, friendlier, and HTTP compliant method is to simply setup a 
url_rewrite_program helper. Which tests URLs against your patterns and 
emits "303:$new_url" when it finds a match on GET requests.

 ** by safer and friendlier, I mean that instead of potentially 
poisoning caches all over the world with broken or corrupt data (see 
recent T-mobile problems) all you do is break any load balancing on the 
websites in question.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.14
  Beta testers wanted for 3.2.0.10