Search squid archive

Re: Duplicate files, content distribution networks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 14/06/2012 8:53 p.m., Jack Bates wrote:
On 18/05/12 05:55 AM, Eliezer Croitoru wrote:
On 18/05/2012 10:33, Jack Bates wrote:
Are there any resources in Squid core or in the Squid community to help
cache duplicate files? Squid is very useful for building content
distribution networks, but how does Squid handle duplicate files from
content distribution networks when it is used as a forward proxy?

This is important to us because many download sites present users with a simple download button that doesn't always send them to the same mirror. Some users are redirected to mirrors that are already cached while other users are redirected to mirrors that aren't. We use a caching proxy in a rural village here in Rwanda to improve internet access, but users often
can't predict whether a download will take seconds, or hours, which is
frustrating

How does Squid handle files distributed from mirrors? Do you know of any
resources concerning forward proxies and download mirrors?
squid 2.7 has the store_url_rewrite option that does what you need.
sourceforge is one nice example for a cdn files download based mirrors.
and you can always use the cache_peer option to use the main squid as a
more updated version and to use only for what you need such as specific
domain from the older version.

Thanks very much for pointing out the store_url_rewrite option Eliezer. Does it require the proxy administrator to manually configure the list of download mirrors?

Does anyone in the Squid community have thoughts on exploiting Metalink [1] to address caching duplicate files from content distribution networks?

The approach I am pursuing is to exploit RFC 6249, Metalink/HTTP: Mirrors and Hashes. Given a response with a "Location: ..." header and at least one "Link: <...>; rel=duplicate" header, the proxy looks up the URLs in these headers in the cache. If the "Location: ..." URL isn't already cached but a "Link: <...>; rel=duplicate" URL is, then the proxy rewrites the "Location: ..." header with the cached URL. This should redirect clients to a mirror that is already cached

Thoughts?

Well, since our very own Henrik Nordstrom is one of the authors. I'd say there were thoughts about it in the Squid community :-)



Another idea is to exploit RFC 3230, Instance Digests in HTTP. Given a response with a "Location: ..." header and a "Digest: ..." header, if the "Location: ..." URL isn't already cached then the proxy checks the cache for content with a matching digest and rewrites the "Location: ..." header with the cached URL if found

I am working on a proof of concept plugin for Apache Traffic Server as part of the Google Summer of Code. The code is up on GitHub [2]

If this is a reasonable approach, would it be difficult to build something similar for Squid?

Please contact Alex Rousskov at measurement-factory.com, he was organising a project to develop Digest handling and de-duplication this a while back.

Amos


[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux