Re: Generell Squid setup

Farkas H <farkas.dus@xxxxxxxxx> · Thu, 6 Sep 2012 23:39:39 +0200

Hi Amos,

thanks for your response.
I modified the web application. Now we have the following infrastructure.
client --> http-Post [embedded http-Get] --> Server / web application
--> http-Get --> Squid -> Servers (-> Squid -> Server / web
application -> client)
Advantage: The Server / web application doesn't have to request data
from the remote servers if it's in the Squid cache.

Additionally I want to cache the http-Post requests.
client --> http-Post --> Squid --> Server / web apllication /
processing the response (-> Squid -> client)

The idea: We modify the header of the http-post request to make it unique.
The information whether Squid has a stored response to the modified
request (true or false) should be added to the request / should be
forwarded to the destination server. There are two possibilities.
(1) The modified request is not stored in Squid (new request).
(2) The modified request is stored in Squid. We don't know yet if the
data is still fresh.
The request should be forwarded in both(!) possibilities, (1) and (2),
to the destination server.
Is that possible with Squid?

After checking the freshness, ... of the data and / or processing the
response data, the web application returns with three possibilities.
(1.1) The result data. Squid should store the data with the modified
request and forward it to the client.
(2.1) The stored data is not fresh. Squid should replace the stored
data and forward it to the client.
(2.1) The stored data is fresh, the web application didn't process a
new response. Just the information that the data is still fresh should
be returned. Squid should forward the stored data to the client.

I wonder if the idea could work and wether it's a question of
configuration or coding in Squid.
Thanks so much for any advice.

Cheers,
Farkas

On 28 August 2012 12:20, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
> On 25/08/2012 8:41 a.m., Farkas H wrote:
>>
>> Hi list,
>>
>> I'm a little confused about the various configuration options of
>> Squid. I have the following setup:
>> Internet clients <-> remote Web server [WS] <-> different remote Web
>> servers [R1], ..., [Rn]
>> [WS] processes the data; [R1], ..., [Rn] provide the data
>>
>> The clients send requests via http-post to [WS].
>> [WS] translates the requests and retrieves the required data from
>> [R1], ..., [Rn] via http-get. [WS] processes the data and sends the
>> responses to the clients.
>>
>> The (requests of [WS] and) the responses of [R1], ..., [Rn] should be
>> cached (inside [WS] surrounding).
>> The number of web servers [R1], ..., [Rn] is relatively small. This
>> should lead to many cache hits.
>
>
> Cache HITs is related to URL space range, not server count. For example
> Wikipedia has a great many servers all serving the same content, they get
> HIT ratio up near 100% sometimes since the client requested URLs are all for
> the one website and usually some "trending" articles.
>
> But since these are "delivery" operations which are being cached and served
> from cache ... the server will never receive the HITs, will never be able to
> update its state according to their receipt. Resulting in possibly very
> broken, very client-visible behaviours unintended by the site designer(s).
>
>
>> I have two suggestions for discussion:
>> (1) normal Squid cache; [WS] acts as a kind of client; [WS] is the
>> only client of Squid Proxy; the requests of [WS] would have to be
>> redirected programmatically to Squid Proxy,
>> (2) reverse proxy (with httpd-accelerator mode).
>>
>> Are these options suitable? Which (other) squid setup would you recommend?
>> Is (1) possible without programming?
>> Which configuration (from http://wiki.squid-cache.org/ConfigExamples)
>> should be chosen for (1) or (2)?
>
>
> Do you own those websites or are providing CDN services to their owners?
> choose (2) - it will pass through the requests unchanged.
>
> Are you ISP for those clients? choose (1), but...
>
>
> Are you aware of the difference between HTTP POST and GET semantics? and how
> that determins very different caching, security, and failure recovery
> models?
>  Why are you re-writing these critical semantics in a relay?
>
> Amos