TCP_HIT/504 problem with small Squid cluster

Robert Knepp <rknepp79@xxxxxxxxx> · Tue, 20 Oct 2009 20:54:40 -0400

Hi - first time poster so be gentle.

Some general info regarding my setup:

0) Running Squid 2.7 in reverse proxy mode
1) Each Squid is configured to use it's local webserver on 127.0.0.1
as the origin server and the other servers in the farm as siblings
2) This Squid cache is transparent to the end-user (although I do pass
along a select few cache controls such as if-none-match).
3) It is protected behind local AUTH applications which perform
complex access checks before passing the request onto Squid
4) All documents will be requested and cached as
[http://127.0.0.1/URL] so Squid is really only serving a single domain

************************************************************************************************
Transparent Proxy Cluster

                                       [user agent]

                                            |
                                            v

                                     [Load Balancer]

                                            |
                                            |
     -------------------------------------------------------------------------------
     |                         |                         |
            |
     v                         v                         v
            v

[WEB1-AUTH]               [WEB2-AUTH]               [WEB3-AUTH]
       [WEB4-AUTH]

     |                         |                         |
            |
     v                         v                         v
            v

 [SQUID1]       (icp)      [SQUID2]       (icp)      [SQUID3]
(icp)      [SQUID4]

     |                         |                         |
            |
     v                         v                         v
            v

[WEB1-ORIG]               [WEB2-ORIG]               [WEB3-ORIG]
       [WEB4-ORIG]

************************************************************************************************

Here is a simplified squid.conf from the first server (all others have
the same settings except the sibling list is shifted).

#------
http_port 3128 act-as-origin accel vhost http11
icp_port 3130
cache_dir ufs /cache/data 2048 16 256
cache_mem 8 GB
request_timeout 5 seconds
persistent_request_timeout 5 seconds
refresh_pattern .       0       20%     4320
negative_ttl 0

acl all src all
acl localhost src 127.0.0.1/xx
acl localnet src 127.0.0.1/xx
acl localnet src xxxxxxxxxxxxx
acl Safe_ports port 3128
acl Safe_ports port 80
http_access allow localhost
http_access deny !Safe_ports
http_access allow localnet
http_access deny all
icp_access allow localnet
icp_access deny all

## Origin server
cache_peer 127.0.0.1 parent 80 0 name=localweb max-conn=250 no-query
no-netdb-exchange originserver http11
cache_peer_access localweb allow localnet
cache_peer_access localweb deny all
## Sibling Caches
#   cache_peer [IP_OF_SIBLING_1] sibling 3128 3130 proxy-only
cache_peer [IP_OF_SIBLING_2] sibling 3128 3130 proxy-only
cache_peer [IP_OF_SIBLING_3] sibling 3128 3130 proxy-only
cache_peer [IP_OF_SIBLING_4] sibling 3128 3130 proxy-only

************************************************************************************************

Here is a simplified squid.conf from the first server (all others have
the same settings except the sibling list is shifted).

#------
http_port 3128 act-as-origin accel vhost http11
icp_port 3130
cache_dir ufs /cache/data 2048 16 256
cache_mem 8 GB
request_timeout 5 seconds
persistent_request_timeout 5 seconds
refresh_pattern .       0       20%     4320
negative_ttl 0

acl all src all
acl localhost src 127.0.0.1/xx
acl localnet src 127.0.0.1/xx
acl localnet src xxxxxxxxxxxxx
acl Safe_ports port 3128
acl Safe_ports port 80
http_access allow localhost
http_access deny !Safe_ports
http_access allow localnet
http_access deny all
icp_access allow localnet
icp_access deny all

## Origin server
cache_peer 127.0.0.1 parent 80 0 name=localweb max-conn=250 no-query
no-netdb-exchange originserver http11
cache_peer_access localweb allow localnet
cache_peer_access localweb deny all
## Sibling Caches
#   cache_peer [IP_OF_SIBLING_1] sibling 3128 3130 proxy-only
cache_peer [IP_OF_SIBLING_2] sibling 3128 3130 proxy-only
cache_peer [IP_OF_SIBLING_3] sibling 3128 3130 proxy-only
cache_peer [IP_OF_SIBLING_4] sibling 3128 3130 proxy-only

************************************************************************************************

So......  I have a 'few' questions regarding my setup and how I might
be able to improve on it.

- Does the ICP sibling setup makes sense or will it limit the number
of servers in the cluster? Or should this be redesigned to work with
multiple parent caches instead of siblings? Or perhaps multicast ICP?
Or I could try digests?

- Would using 'icp_hit_stale' and 'allow-miss' improve hit-ratios
between the shards? Is there a way to force a given Squid server to be
the ONLY server storing a cached document (stale, fresh, or
otherwise)?

- Using this basic setup for about a month now and I am getting
strange squid access.log entries when the load goes up:

2009-04-04 11:13:47 504 GET "http://127.0.0.1:3128/[URL]"; TCP_HIT NONE
3018 0 "127.0.0.1" "127.0.0.1:3128" "-" "-"

  The gateway timeout lines appear during high load and are usually
(but not always) close to UDP_HIT entries on the same URI. In most
cases like this the document gets returned to the user with a status
of 200. It confused me since I thought TCP_HIT represented a cache
object was found locally and is being served.
  Or maybe it is related to a false UDP_HIT? Could this be network
related? Or can a slow response from the origin server cause this?

- Is 8 GB cache memory (out of 32GB in each box) going to cause
problems for Squid? And what happens if it fills up quickly?

Anyway,I just wanted to throw out these questions for the experts.
I'll likely be trying some of these changes just to see the effects.

Feel free to chime in on any of this stuff.
Thanks in advance,
Rob