Re: HTTP Cache General Question

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Fri, 09 Oct 2009 18:26:47 +1300

CC'ing squid-dev so the other developers can get a look and comment

Mark Schall wrote:
Thank you for the information.

One more question:

We're looking at researching if it is possible to cache P2P data in an
HTTP Cache (purely research).  What we have assumed is that if we were
to send an HTTP request to an IP address (a diff peer) (1.2.3.4) and
the header would have a URI that does not correlate with the IP
address that the Web Cache would store based on the URI in the header.
 This way if we sent to a diff peer (5.6.7.8) with the same URI in the
header, we'd get back the cached data.

I know this big assumption, and would change our approach if not true,
but it seems logical to be able to work this way.  Do you know if
Squid works in this way?

I've given this a small amount of thought over the last few years. And 
bounced the idea off Adrian after hours at a conference last year.

We came to the conclusion that it would be a very difficult thing to do 
in Squid as the code currently stands.

It is theoretically possible and relatively easy to add a P2P port and 
an engine to handle the requests arriving. Also to cache the objects 
similar to any others, URI can be provided as you say, or even created 
as needed directly out of P2P meta data in the .torrent case.

The major blocker problem is that Squid cache storage does not yet 
support partial ranges of objects. This is a big problem for HTTP and 
becomes a critical issue if P2P downloads are added. It means 
essentially that the segments of P2P files cannot be fetched in parallel 
from multiple sources. Breaking the best benefits P2P would bring to 
Squid. The P2P files could still be fetched linearly by Squid however.

A lesser major issue is the sheer size of P2P objects and traffic. 
Caches are already filled with a lot of content from HTTP alone, adding 
P2P requests to that would add a major burden on storage space. This is 
more an obstacle for the admin however.

Beyond that there is a lot of small pieces of work to make Squid capable 
of contacting P2P servers and peers, intercept seed file requests, etc.

We also think it'd be possible for the cache to take the HTTP header,
check to see if the URI is in the cache, and if not send the header to
the domain in the header.

This is how proxies already work at a fundamental level.

Thank you again

Mark Schall
Michigan State University
CSE Graduate Student

On Wed, Oct 7, 2009 at 11:17 PM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
On Wed, 7 Oct 2009 11:24:29 -0400, Mark Schall <schallm2@xxxxxxx> wrote:
Hi,

My name is Mark Schall.  I am a Master student at Michigan State
University.  I am working with a group, trying to work with HTTP
caches.  We were wondering if, in general do HTTP caches work by
caching data based on the IP addresses or by the URI of the HTTP
In general? I wont dare to guess. Too many ways to do it and too many
different software caches using those ways.

Squid in particular stores them by hash. Older versions used hash of URL.
Newer 2.x use Hash of URL + some Vary: headers and stuff.

request.  It seems that using IP addresses would be the most secure
means of caching, but the URI seems logical for multiple server
websites.
I assume by 'secure' you mean 'secure against data leaks'. There is nothing
inherently secure about caching in the first place. The cache admin always
has access to the cached data in intermediary traffic.
What security there is in caching is built on a trust between cache admin
and website admin. The website admin trusts that the cache admin will obey
the CC headers. The cache admin trusts that the website admin will set the
headers correctly (private) to protect sensitive information and also
inform how often objects get replaced etc.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE7 or 3.0.STABLE19
  Current Beta Squid 3.1.0.14