On 14/06/2012 8:53 p.m., Jack Bates wrote:
On 18/05/12 05:55 AM, Eliezer Croitoru wrote:
On 18/05/2012 10:33, Jack Bates wrote:
Are there any resources in Squid core or in the Squid community to help
cache duplicate files? Squid is very useful for building content
distribution networks, but how does Squid handle duplicate files from
content distribution networks when it is used as a forward proxy?
This is important to us because many download sites present users
with a
simple download button that doesn't always send them to the same
mirror.
Some users are redirected to mirrors that are already cached while
other
users are redirected to mirrors that aren't. We use a caching proxy
in a
rural village here in Rwanda to improve internet access, but users
often
can't predict whether a download will take seconds, or hours, which is
frustrating
How does Squid handle files distributed from mirrors? Do you know of
any
resources concerning forward proxies and download mirrors?
squid 2.7 has the store_url_rewrite option that does what you need.
sourceforge is one nice example for a cdn files download based mirrors.
and you can always use the cache_peer option to use the main squid as a
more updated version and to use only for what you need such as specific
domain from the older version.
Thanks very much for pointing out the store_url_rewrite option
Eliezer. Does it require the proxy administrator to manually configure
the list of download mirrors?
Does anyone in the Squid community have thoughts on exploiting
Metalink [1] to address caching duplicate files from content
distribution networks?
The approach I am pursuing is to exploit RFC 6249, Metalink/HTTP:
Mirrors and Hashes. Given a response with a "Location: ..." header and
at least one "Link: <...>; rel=duplicate" header, the proxy looks up
the URLs in these headers in the cache. If the "Location: ..." URL
isn't already cached but a "Link: <...>; rel=duplicate" URL is, then
the proxy rewrites the "Location: ..." header with the cached URL.
This should redirect clients to a mirror that is already cached
Thoughts?
Well, since our very own Henrik Nordstrom is one of the authors. I'd say
there were thoughts about it in the Squid community :-)
Another idea is to exploit RFC 3230, Instance Digests in HTTP. Given a
response with a "Location: ..." header and a "Digest: ..." header, if
the "Location: ..." URL isn't already cached then the proxy checks the
cache for content with a matching digest and rewrites the "Location:
..." header with the cached URL if found
I am working on a proof of concept plugin for Apache Traffic Server as
part of the Google Summer of Code. The code is up on GitHub [2]
If this is a reasonable approach, would it be difficult to build
something similar for Squid?
Please contact Alex Rousskov at measurement-factory.com, he was
organising a project to develop Digest handling and de-duplication this
a while back.
Amos