On 18/05/12 05:55 AM, Eliezer Croitoru wrote:
On 18/05/2012 10:33, Jack Bates wrote:
Are there any resources in Squid core or in the Squid community to help
cache duplicate files? Squid is very useful for building content
distribution networks, but how does Squid handle duplicate files from
content distribution networks when it is used as a forward proxy?
This is important to us because many download sites present users with a
simple download button that doesn't always send them to the same mirror.
Some users are redirected to mirrors that are already cached while other
users are redirected to mirrors that aren't. We use a caching proxy in a
rural village here in Rwanda to improve internet access, but users often
can't predict whether a download will take seconds, or hours, which is
frustrating
How does Squid handle files distributed from mirrors? Do you know of any
resources concerning forward proxies and download mirrors?
squid 2.7 has the store_url_rewrite option that does what you need.
sourceforge is one nice example for a cdn files download based mirrors.
and you can always use the cache_peer option to use the main squid as a
more updated version and to use only for what you need such as specific
domain from the older version.
Thanks very much for pointing out the store_url_rewrite option Eliezer.
Does it require the proxy administrator to manually configure the list
of download mirrors?
Does anyone in the Squid community have thoughts on exploiting Metalink
[1] to address caching duplicate files from content distribution networks?
The approach I am pursuing is to exploit RFC 6249, Metalink/HTTP:
Mirrors and Hashes. Given a response with a "Location: ..." header and
at least one "Link: <...>; rel=duplicate" header, the proxy looks up the
URLs in these headers in the cache. If the "Location: ..." URL isn't
already cached but a "Link: <...>; rel=duplicate" URL is, then the proxy
rewrites the "Location: ..." header with the cached URL. This should
redirect clients to a mirror that is already cached
Thoughts?
Another idea is to exploit RFC 3230, Instance Digests in HTTP. Given a
response with a "Location: ..." header and a "Digest: ..." header, if
the "Location: ..." URL isn't already cached then the proxy checks the
cache for content with a matching digest and rewrites the "Location:
..." header with the cached URL if found
I am working on a proof of concept plugin for Apache Traffic Server as
part of the Google Summer of Code. The code is up on GitHub [2]
If this is a reasonable approach, would it be difficult to build
something similar for Squid?
[1] http://metalinker.org/
[2] https://github.com/jablko/dedup