Re: Generell Squid setup

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Tue, 28 Aug 2012 22:20:30 +1200

On 25/08/2012 8:41 a.m., Farkas H wrote:
Hi list,

I'm a little confused about the various configuration options of
Squid. I have the following setup:
Internet clients <-> remote Web server [WS] <-> different remote Web
servers [R1], ..., [Rn]
[WS] processes the data; [R1], ..., [Rn] provide the data

The clients send requests via http-post to [WS].
[WS] translates the requests and retrieves the required data from
[R1], ..., [Rn] via http-get. [WS] processes the data and sends the
responses to the clients.

The (requests of [WS] and) the responses of [R1], ..., [Rn] should be
cached (inside [WS] surrounding).
The number of web servers [R1], ..., [Rn] is relatively small. This
should lead to many cache hits.

Cache HITs is related to URL space range, not server count. For example 
Wikipedia has a great many servers all serving the same content, they 
get HIT ratio up near 100% sometimes since the client requested URLs are 
all for the one website and usually some "trending" articles.

But since these are "delivery" operations which are being cached and 
served from cache ... the server will never receive the HITs, will never 
be able to update its state according to their receipt. Resulting in 
possibly very broken, very client-visible behaviours unintended by the 
site designer(s).

I have two suggestions for discussion:
(1) normal Squid cache; [WS] acts as a kind of client; [WS] is the
only client of Squid Proxy; the requests of [WS] would have to be
redirected programmatically to Squid Proxy,
(2) reverse proxy (with httpd-accelerator mode).

Are these options suitable? Which (other) squid setup would you recommend?
Is (1) possible without programming?
Which configuration (from http://wiki.squid-cache.org/ConfigExamples)
should be chosen for (1) or (2)?

Do you own those websites or are providing CDN services to their owners? 
choose (2) - it will pass through the requests unchanged.

Are you ISP for those clients? choose (1), but...

Are you aware of the difference between HTTP POST and GET semantics? and 
how that determins very different caching, security, and failure 
recovery models?
 Why are you re-writing these critical semantics in a relay?

Amos