On 25/08/2012 8:41 a.m., Farkas H wrote:
Hi list,
I'm a little confused about the various configuration options of
Squid. I have the following setup:
Internet clients <-> remote Web server [WS] <-> different remote Web
servers [R1], ..., [Rn]
[WS] processes the data; [R1], ..., [Rn] provide the data
The clients send requests via http-post to [WS].
[WS] translates the requests and retrieves the required data from
[R1], ..., [Rn] via http-get. [WS] processes the data and sends the
responses to the clients.
The (requests of [WS] and) the responses of [R1], ..., [Rn] should be
cached (inside [WS] surrounding).
The number of web servers [R1], ..., [Rn] is relatively small. This
should lead to many cache hits.
Cache HITs is related to URL space range, not server count. For example
Wikipedia has a great many servers all serving the same content, they
get HIT ratio up near 100% sometimes since the client requested URLs are
all for the one website and usually some "trending" articles.
But since these are "delivery" operations which are being cached and
served from cache ... the server will never receive the HITs, will never
be able to update its state according to their receipt. Resulting in
possibly very broken, very client-visible behaviours unintended by the
site designer(s).
I have two suggestions for discussion:
(1) normal Squid cache; [WS] acts as a kind of client; [WS] is the
only client of Squid Proxy; the requests of [WS] would have to be
redirected programmatically to Squid Proxy,
(2) reverse proxy (with httpd-accelerator mode).
Are these options suitable? Which (other) squid setup would you recommend?
Is (1) possible without programming?
Which configuration (from http://wiki.squid-cache.org/ConfigExamples)
should be chosen for (1) or (2)?
Do you own those websites or are providing CDN services to their owners?
choose (2) - it will pass through the requests unchanged.
Are you ISP for those clients? choose (1), but...
Are you aware of the difference between HTTP POST and GET semantics? and
how that determins very different caching, security, and failure
recovery models?
Why are you re-writing these critical semantics in a relay?
Amos