Re: Multiple squid cache instances throttled by website

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Tue, 28 Feb 2012 11:31:02 +1300

On 28.02.2012 10:36, Francis Fauteux wrote:
We are using a collection of Squid instances (version 2.7.stable9) as
caching proxies behind a single gateway IP, processing requests from 
a
large number of users.

By gateway IP do you mean a NAT gateway? or each Squid instance setting 
its outgoing IP to the same value?

If this is a NAT gateway could it be simple NAT table pair/tuplet 
limitations?

We've observed that a number of websites throttle our usage when
requests targeting a given domain are processed by two or more
instances. This does not occur when all requests to this domain are
processed by a single instance.

We are trying to find the root cause for this behaviour, and the fact
that it does not occur with a single Squid instance may help us
diagnose. From the origin server's perspective, only two changes are
visible between using a single instance and using two or more:

* The value injected in the 'Via' header differs between Squid
instances. The web server may not expect requests coming from a 
single
IP to contain different values for the HTTP 'Via' header. This is
something we can investigate ourselves, but input would be welcome.

If the web server is in fact doing such checks it is in violation of 
HTTP specification. HTTP is message-based in the same model as TCP is 
packet-based. Which route the message/packet took is mostly irrelevant, 
although they cold be checking it for security access that should not 
have side effects like this. Your multiple instances could even share 
the same packet connection and expect it to work (er, Squid does 
pipelining multiplexing).

There is no way the server can rely on a specific one of these chaining 
scenarios:
  client->A->server
  client->B->server
  client->A->B->server

and speaking of those scenarios, it seems more likely to me that third 
scenario is happening to you. Each layer of proxying adds latency, so 
messages doing the A->B hop could appear slower (throttled?) than when 
its not present. The CARP design is specifically tuned to make such 
multi-hop layering efficient, but generic peer clusters doing it can 
slow things down.

* If each Squid limits the number of connections to a given server,
using several instances may cause the origin server to see a number 
of
connections which exceeds what they expect to see from a single IP.
This is the question for this forum: does Squid actually limit the
number of per-server connections? Is this number configurable (either
in squid.conf or by rebuilding)?

The default is not to limit. You can configure a limit on clients if 
you wish.

If this is relevant it would be in the form of a client connection 
limit at the server end.

Note that each affected website resolves to a single IP; Squid
instances are not receiving different IPs from DNS servers.

Amos