On 28.02.2012 10:36, Francis Fauteux wrote:
We are using a collection of Squid instances (version 2.7.stable9) as
caching proxies behind a single gateway IP, processing requests from
a
large number of users.
By gateway IP do you mean a NAT gateway? or each Squid instance setting
its outgoing IP to the same value?
If this is a NAT gateway could it be simple NAT table pair/tuplet
limitations?
We've observed that a number of websites throttle our usage when
requests targeting a given domain are processed by two or more
instances. This does not occur when all requests to this domain are
processed by a single instance.
We are trying to find the root cause for this behaviour, and the fact
that it does not occur with a single Squid instance may help us
diagnose. From the origin server's perspective, only two changes are
visible between using a single instance and using two or more:
* The value injected in the 'Via' header differs between Squid
instances. The web server may not expect requests coming from a
single
IP to contain different values for the HTTP 'Via' header. This is
something we can investigate ourselves, but input would be welcome.
If the web server is in fact doing such checks it is in violation of
HTTP specification. HTTP is message-based in the same model as TCP is
packet-based. Which route the message/packet took is mostly irrelevant,
although they cold be checking it for security access that should not
have side effects like this. Your multiple instances could even share
the same packet connection and expect it to work (er, Squid does
pipelining multiplexing).
There is no way the server can rely on a specific one of these chaining
scenarios:
client->A->server
client->B->server
client->A->B->server
and speaking of those scenarios, it seems more likely to me that third
scenario is happening to you. Each layer of proxying adds latency, so
messages doing the A->B hop could appear slower (throttled?) than when
its not present. The CARP design is specifically tuned to make such
multi-hop layering efficient, but generic peer clusters doing it can
slow things down.
* If each Squid limits the number of connections to a given server,
using several instances may cause the origin server to see a number
of
connections which exceeds what they expect to see from a single IP.
This is the question for this forum: does Squid actually limit the
number of per-server connections? Is this number configurable (either
in squid.conf or by rebuilding)?
The default is not to limit. You can configure a limit on clients if
you wish.
If this is relevant it would be in the form of a client connection
limit at the server end.
Note that each affected website resolves to a single IP; Squid
instances are not receiving different IPs from DNS servers.
Amos