Re: Generell Squid setup

Farkas H <farkas.dus@xxxxxxxxx> · Wed, 29 Aug 2012 16:13:12 +0200

Hi Amos,
thanks for your response.
My part is the web server in the middle [WS] providing services to
process data. Users send requests via http-post with embedded http-get
requests to the web server. I don't want to touch this for the moment.

The web server sends the embedded http-get requests to remote servers
(not mine), receives the requested data, processes the data and
returns the result.
I want to cache the data of the remote servers. I think it's necessary
to redirect the http-get output of the web server to Squid. I would
say Squid should be behind the web server and not in front like a
reverse proxy but I'm not a specialist. What is your opinion? Is there
a chance to do this (without coding)?
I appreciate any advice.
Thanks, Farkas

On 28 August 2012 12:20, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
> On 25/08/2012 8:41 a.m., Farkas H wrote:
>>
>> Hi list,
>>
>> I'm a little confused about the various configuration options of
>> Squid. I have the following setup:
>> Internet clients <-> remote Web server [WS] <-> different remote Web
>> servers [R1], ..., [Rn]
>> [WS] processes the data; [R1], ..., [Rn] provide the data
>>
>> The clients send requests via http-post to [WS].
>> [WS] translates the requests and retrieves the required data from
>> [R1], ..., [Rn] via http-get. [WS] processes the data and sends the
>> responses to the clients.
>>
>> The (requests of [WS] and) the responses of [R1], ..., [Rn] should be
>> cached (inside [WS] surrounding).
>> The number of web servers [R1], ..., [Rn] is relatively small. This
>> should lead to many cache hits.
>
>
> Cache HITs is related to URL space range, not server count. For example
> Wikipedia has a great many servers all serving the same content, they get
> HIT ratio up near 100% sometimes since the client requested URLs are all for
> the one website and usually some "trending" articles.
>
> But since these are "delivery" operations which are being cached and served
> from cache ... the server will never receive the HITs, will never be able to
> update its state according to their receipt. Resulting in possibly very
> broken, very client-visible behaviours unintended by the site designer(s).
>
>
>> I have two suggestions for discussion:
>> (1) normal Squid cache; [WS] acts as a kind of client; [WS] is the
>> only client of Squid Proxy; the requests of [WS] would have to be
>> redirected programmatically to Squid Proxy,
>> (2) reverse proxy (with httpd-accelerator mode).
>>
>> Are these options suitable? Which (other) squid setup would you recommend?
>> Is (1) possible without programming?
>> Which configuration (from http://wiki.squid-cache.org/ConfigExamples)
>> should be chosen for (1) or (2)?
>
>
> Do you own those websites or are providing CDN services to their owners?
> choose (2) - it will pass through the requests unchanged.
>
> Are you ISP for those clients? choose (1), but...
>
>
> Are you aware of the difference between HTTP POST and GET semantics? and how
> that determins very different caching, security, and failure recovery
> models?
>  Why are you re-writing these critical semantics in a relay?
>
> Amos