Re: TCP_MISS/504 after UDP_HIT - from sibling squid

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Fri, 08 Oct 2010 15:57:09 +1300

On 08/10/10 05:42, Adrian Dascalu wrote:
Hi to all squid users!

I'm new to this list so please hold the big guns.

The problem you outline is discussed last in this reply. I've taken the 
opportunity to comment on the config improvements possible all the way down.

Here's my setup:

1.	Using Squid squid-2.6.STABLE6-5.el5_1.3 (pinned at this version since all newer ones will eventually stop responding with 100%cpu. But this could be the subject of another post on this list)
2.	2 servers in a heartbeat cluster. 192.168.2.1-2 are the IPs used for the internal communication in the cluster.
3.	The requests come to Apache server who passes them to squid on the localhost.

Squid is designed to be used the other way around.
The only reason I'm aware of for placing Apache out front it to map URL 
to Zopes weird virtual hosting URI space. You appear to be using squirm 
to do this instead.
 Is there another reason I'm not aware of?

4.	The squids are configured to use the other squid as sibling and webserver instances from both servers as parents. ICP is used in all cases (the webservers will always reply MISS but the fastest to reply to ICP is probably the less busy and closest)

My squid config looks like this:

********************************************************************
cache_effective_user squid
cache_effective_group squid
http_port 192.168.2.2:3128 transparent
http_port 127.0.0.1:3128 transparent

Are you receiving regular ISP-type traffic from internal PCs at this Squid?
The rest of your config indicates only some administrative channel. As 
such you can drop the "transparent" security hole (and slow NAT 
lookups!) and use "accel" etc instead.

NP: "accel" automatically turns on "never_direct deny all"

icp_port 3130
udp_incoming_address 192.168.2.2
cache_dir ufs /var/spool/squid 20000 16 256
cache_mgr webadmin@xxxxxxxxxxxxxxxxxxx
visible_hostname host1.subdomain.domain.xx
log_icp_queries on
cache_access_log /var/log/squid/access.log
cache_log /var/log/squid/cache.log
cache_store_log /var/log/squid/store.log
cache_store_log none

Remove the first of those lines. It's overriden by the second.

emulate_httpd_log off

This is the default and a deprecated option. I think you can remove it 
from the config.

cache_mem 512 MB

NP: the bigger you can make this the faster Squids hits will go (within 
reason). The squid-2.x individual object in memory MB limit I see you 
are already aware of below.

maximum_object_size 100 MB              # max cached object size
maximum_object_size_in_memory 1 MB # max cached-in-memory object size
acl all src 0.0.0.0/0.0.0.0

acl all src all

acl localhost src 127.0.0.1/32
acl localnet src 192.168.2.0/24
acl ssl_ports port 443 563
acl safe_ports port 81 80 443
acl zope_servers src 127.0.0.1
acl zope_servers src XXX.XXX.XXX.181
acl zope_servers src XXX.XXX.XXX.134
acl zope_servers src XXX.XXX.XXX.155
acl zope_servers src 192.168.2.0/24
acl manager proto cache_object
acl connect method connect
acl accelerated_protocols proto http
acl accelerated_hosts dst 127.0.0.0/8
acl accelerated_hosts dst XXX.XXX.XXX.181/32
acl accelerated_hosts dst XXX.XXX.XXX.155/32

You call these two accelerated hosts but I see no cache_peer entries 
allowing Squid to pass requests to them.
You don't even use this ACL so I say remove it to make things clearer.

acl accelerated_ports myport 3128

another unused ACL.

acl purge method PURGE
http_access allow zope_servers purge
http_access deny purge
http_reply_access allow all
acl webdav method PROPFIND TRACE PURGE PROPPATCH MKCOL COPY MOVE LOCK UNLOCK
never_direct allow all
http_access allow manager localnet
http_access allow manager localhost
http_access deny manager
http_access deny connect !ssl_ports
icp_access allow localhost
icp_access allow localnet
http_access allow all

Not great. I'm sure you have an index or registry somewhere of your 
served domains. If its large use an external ACL to hook in and do 
lookups real-time.
 This will trade a small amount of external lookups (most get cache for 
zero cost) for a large(er) amount of processing invalid domains and 
attack requests.

Or, when isolated away from the general Internet like you have use "src" 
ACL to enumerate the machines/ranges allowed to pass requests in to this 
Squid.

cache_peer 192.168.2.1 sibling 3128 3130 name=theothersquid
cache_peer 192.168.2.1 parent 8988 3988 no-netdb-exchange round-robin no-digest name=11
cache_peer 192.168.2.1 parent 8990 3990 no-netdb-exchange round-robin no-digest name=12
<snip>
cache_peer 192.168.2.2 parent 9008 4008 no-netdb-exchange round-robin no-digest name=211
cache_peer 192.168.2.2 parent 9010 4010 no-netdb-exchange round-robin no-digest name=212

"round-robin" or ICP. With 2.6 you can pick only one.

3.0+ is needed for "weighted-round-robin background-ping" where the ICP 
lag times are used to select fastest respondents more often. This also 
measures the HTTP lag times and ICMP pinger tests. So ICP is not 
strictly required.

redirect_program /var/XXDIR/bin/squirm
redirect_children 20
redirect_rewrites_host_header off
acl static_content urlpath_regex -i \.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|js|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$
acl static_content urlpath_regex (.*)misc_/ExternalEditor/edit_icon$
acl static_content urlpath_regex (.*)p_/(.*)

Remove the (.*) prefix and trailer from the above patterns. Regex 
assumes they are there unless the ^ and $ anchors are used.

"no_cache" is an obsolete and confusing name. Remove the "no_" part from 
all these lines...

no_cache allow static_content
acl post_requests method POST
no_cache deny post_requests

POST requests are not cachable due to how they work in HTTP. Move denial 
to the top of your cache tests.

NP: I'm not too sure about 2.6, but you may find POST requests and 
others like it are never even checked for the "cache" access controls.

acl QUERY urlpath_regex \?
acl CGIBIN urlpath_regex cgi-bin
no_cache allow QUERY
no_cache deny CGIBIN

The QUERY and CGIBIN bits you may want to re-consider. We now recommend 
allowing them to cache. With a refresh_pattern used to expire the broken 
ones placed immediately before the "." pattern:
  refresh_pattern -i (/cgi-bin/|\?) 0 0% 0

You will need this pattern anyway since you cache the \? pages.

The QUERY pattern if you want to keep it as allow can be merged as one 
of the static_content patterns. Might be good to call static_content 
slightly different after that.

external_acl_type is_cacheable_type children=5 %{Cookie:__ac} %{Cookie:;__ac} %{Authorization} %{If-None-Match} /var/XXDIR/bin/squidAcl.py
acl is_cacheable external is_cacheable_type
no_cache allow is_cacheable

What exactly is that helper doing if I may ask?

no_cache deny all

Hmm, you wanted performance. Thats usually gained by increasing the 
amount cached and thus reducing network distance to client and server load.

If this was done to prevent drive-by attacks poisoning the cache the 
conversion to proper reverse-proxy "accel" config will fix that.

If this was done due to the web servers output you may gain by inverting 
the approach here to what is the intended use of "cache". Caching 
everything but allowing explicit denial of badness where known.

negative_ttl 0
refresh_pattern . 0 50% 999999 ignore-reload
refresh_pattern -i /getFile$ 60 90% 3600

The "." refresh_pattern will match *everything*. Your custom patterns 
need to be placed above it to have any effect.

Also very large numbers in the min/max will 32-bit wrap when multiplied 
up to a timestamp and end up doing the opposite of what you want. It's 
not good in general to cache for more than a year so they should be set 
to 525600 or less.

ie:
  refresh_pattern -i /getFile$ 60 90% 3600
  refresh_pattern -i (/cgi-bin/|\?) 0 0% 0  ignore-reload
  refresh_pattern . 0 50% 525600 ignore-reload

NP: If you can upgrade to 3.1+ you gain the "accel ignore-cc" option 
combo on http_port which overrides all the possible client-sent 
controls, not just the reload.

shutdown_lifetime 1 seconds
pipeline_prefetch on

*******************************************************************

The other squid will have a very similar config, just replace 192.168.2.1 with 192.168.2.2 and vice-versa.

The main problem I'm facing is that every time the squid on the "passive" member responds with UDP_HIT the following line will be a TCP_MISS/504. Like this:

1286468808.210      0 192.168.2.1 UDP_HIT/000 168 ICP_QUERY http://127.0.0.1:3128/path/to/object - NONE/- -
1286468808.721      4 192.168.2.1 TCP_MISS/504 1915 GET http://127.0.0.1:3128/path/to/object - NONE/- text/html

Are these logs lines from 192.168.2.1 or 192.168.2.2?

If they are recorded on 192.168.2.1 they show a loop as it fetches from 
itself and fails badly. The thing about loops is that they can hold up a 
lot of resources for a long time before stopping and being logged.

If they are recorded on 192.168.2.2, I expect they are just showing ICP 
false-positivies. ICP is known to be limited in the things it can match 
on. ie just the URL. Vary headers are a big problem when matching. You 
could disable the use of ICP entirely and use the round-robin.

You will need a newer Squid to get better accuracy than ICP. One which 
supports HTCP and has more HTTP/1.1 compliant caching behaviour. HTCP 
will also let you use the nifty recursive HTCP CLR instead of HTTP PURGE.

Also note how Squid is informing the web server that it's domain name is 
"127.0.0.1:3128". This is due to lack of the "accel vhost" options on 
http_port.

I've searched this list and internet in general for ideas of what I'm doing wrong and came up empty.

I'm open to any suggestion for improvement in this setup. Performance is my main goal.

Many thanks,
Adrian

HTH
Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.8
  Beta testers wanted for 3.2.0.2