RE: TCP_MISS/504 after UDP_HIT - from sibling squid

Adrian Dascalu <Adrian.Dascalu@xxxxxxxxxxxxx> · Fri, 8 Oct 2010 11:57:59 +0200

> On 08/10/10 05:42, Adrian Dascalu wrote:
> > Hi to all squid users!
> >
> > I'm new to this list so please hold the big guns.
> 
> The problem you outline is discussed last in this reply. I've taken the
> opportunity to comment on the config improvements possible all the way
> down.

Thank you. I've been secretly hoping you will do that :)

> 
> >
> > Here's my setup:
> >
> > 1.	Using Squid squid-2.6.STABLE6-5.el5_1.3 (pinned at this version
> since all newer ones will eventually stop responding with 100%cpu. But
> this could be the subject of another post on this list)
> > 2.	2 servers in a heartbeat cluster. 192.168.2.1-2 are the IPs used
> for the internal communication in the cluster.
> > 3.	The requests come to Apache server who passes them to squid on
> the localhost.
> 
> Squid is designed to be used the other way around.
> The only reason I'm aware of for placing Apache out front it to map URL
> to Zopes weird virtual hosting URI space. You appear to be using squirm
> to do this instead.
>   Is there another reason I'm not aware of?

You have guessed the main reason correctly (zope uri space). There is also a large set of rewrite rules related to content reorganization(s).  I've introduced squirm with the intention of getting rid of the apache in front. The people maintaining the (huge) set of rewrite rules in apache have had problems understanding why they cannot simply copy their regular expressions from apache to squirm (lazy vs. greedy). So, squirm will go out.

> 
> 
> > 4.	The squids are configured to use the other squid as sibling and
> webserver instances from both servers as parents. ICP is used in all
> cases (the webservers will always reply MISS but the fastest to reply
> to ICP is probably the less busy and closest)
> >
> > My squid config looks like this:
> >
> > ********************************************************************
> > cache_effective_user squid
> > cache_effective_group squid
> > http_port 192.168.2.2:3128 transparent
> > http_port 127.0.0.1:3128 transparent
> 
> Are you receiving regular ISP-type traffic from internal PCs at this
> Squid?
> The rest of your config indicates only some administrative channel. As
> such you can drop the "transparent" security hole (and slow NAT
> lookups!) and use "accel" etc instead.
> 
> NP: "accel" automatically turns on "never_direct deny all"
> 

I will use accel. Thank you.

> > icp_port 3130
> > udp_incoming_address 192.168.2.2
> > cache_dir ufs /var/spool/squid 20000 16 256
> > cache_mgr webadmin@xxxxxxxxxxxxxxxxxxx
> > visible_hostname host1.subdomain.domain.xx
> > log_icp_queries on
> > cache_access_log /var/log/squid/access.log
> > cache_log /var/log/squid/cache.log
> > cache_store_log /var/log/squid/store.log
> > cache_store_log none
> 
> Remove the first of those lines. It's overriden by the second.
> 
> > emulate_httpd_log off
> 
> This is the default and a deprecated option. I think you can remove it
> from the config.
> 

Will do. The icp_port directive being overridden by the udp_incoming_address directive ... I should have seen this! Thanks

> > cache_mem 512 MB
> 
> NP: the bigger you can make this the faster Squids hits will go (within
> reason). The squid-2.x individual object in memory MB limit I see you
> are already aware of below.
> 
> > maximum_object_size 100 MB              # max cached object size
> > maximum_object_size_in_memory 1 MB # max cached-in-memory object size
> > acl all src 0.0.0.0/0.0.0.0
> 
> acl all src all
> 
> > acl localhost src 127.0.0.1/32
> > acl localnet src 192.168.2.0/24
> > acl ssl_ports port 443 563
> > acl safe_ports port 81 80 443
> > acl zope_servers src 127.0.0.1
> > acl zope_servers src XXX.XXX.XXX.181
> > acl zope_servers src XXX.XXX.XXX.134
> > acl zope_servers src XXX.XXX.XXX.155
> > acl zope_servers src 192.168.2.0/24
> > acl manager proto cache_object
> > acl connect method connect
> > acl accelerated_protocols proto http
> > acl accelerated_hosts dst 127.0.0.0/8
> > acl accelerated_hosts dst XXX.XXX.XXX.181/32
> > acl accelerated_hosts dst XXX.XXX.XXX.155/32
> 
> You call these two accelerated hosts but I see no cache_peer entries
> allowing Squid to pass requests to them.
> You don't even use this ACL so I say remove it to make things clearer.
> 
> > acl accelerated_ports myport 3128
> 
> another unused ACL.
> 
> > acl purge method PURGE
> > http_access allow zope_servers purge
> > http_access deny purge
> > http_reply_access allow all
> > acl webdav method PROPFIND TRACE PURGE PROPPATCH MKCOL COPY MOVE LOCK
> UNLOCK
> > never_direct allow all
> > http_access allow manager localnet
> > http_access allow manager localhost
> > http_access deny manager
> > http_access deny connect !ssl_ports
> > icp_access allow localhost
> > icp_access allow localnet
> > http_access allow all
> 
> Not great. I'm sure you have an index or registry somewhere of your
> served domains. If its large use an external ACL to hook in and do
> lookups real-time.
>   This will trade a small amount of external lookups (most get cache
> for
> zero cost) for a large(er) amount of processing invalid domains and
> attack requests.
> 
> Or, when isolated away from the general Internet like you have use
> "src"
> ACL to enumerate the machines/ranges allowed to pass requests in to
> this
> Squid.

Again spot on. "Allow all" is there temporarily (hehe!) since the last major setup change. In the current setup squid only listens on localhost and the interface to the other cluster member. There is no (easy) way someone other than the apache server on localhost or the other squid to pass requests ...
I'll restrict this to localhost and localnet (as defined above). 

> 
> 
> > cache_peer 192.168.2.1 sibling 3128 3130 name=theothersquid
> > cache_peer 192.168.2.1 parent 8988 3988 no-netdb-exchange round-robin
> no-digest name=11
> > cache_peer 192.168.2.1 parent 8990 3990 no-netdb-exchange round-robin
> no-digest name=12
> <snip>
> > cache_peer 192.168.2.2 parent 9008 4008 no-netdb-exchange round-robin
> no-digest name=211
> > cache_peer 192.168.2.2 parent 9010 4010 no-netdb-exchange round-robin
> no-digest name=212
> 
> "round-robin" or ICP. With 2.6 you can pick only one.
> 
> 3.0+ is needed for "weighted-round-robin background-ping" where the ICP
> lag times are used to select fastest respondents more often. This also
> measures the HTTP lag times and ICMP pinger tests. So ICP is not
> strictly required.
> 

My understanding was that in case of ICP timeout round-robin will be used to select a parent.

> 
> > redirect_program /var/XXDIR/bin/squirm
> > redirect_children 20
> > redirect_rewrites_host_header off
> > acl static_content urlpath_regex -i
> \.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|js|vsd|doc|ppt|pps|xls|pdf
> |mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg
> |odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$
> > acl static_content urlpath_regex (.*)misc_/ExternalEditor/edit_icon$
> > acl static_content urlpath_regex (.*)p_/(.*)
> 
> Remove the (.*) prefix and trailer from the above patterns. Regex
> assumes they are there unless the ^ and $ anchors are used.
> 

noted

> 
> "no_cache" is an obsolete and confusing name. Remove the "no_" part
> from
> all these lines...
> 
> > no_cache allow static_content
> > acl post_requests method POST
> > no_cache deny post_requests
> 

Changed to "cache allow static_content". 

> POST requests are not cachable due to how they work in HTTP. Move
> denial
> to the top of your cache tests.
> 
> NP: I'm not too sure about 2.6, but you may find POST requests and
> others like it are never even checked for the "cache" access controls.
> 
> > acl QUERY urlpath_regex \?
> > acl CGIBIN urlpath_regex cgi-bin
> > no_cache allow QUERY
> > no_cache deny CGIBIN
> 
> 
> The QUERY and CGIBIN bits you may want to re-consider. We now recommend
> allowing them to cache. With a refresh_pattern used to expire the
> broken
> ones placed immediately before the "." pattern:
>    refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
> 
> You will need this pattern anyway since you cache the \? pages.
> 
> The QUERY pattern if you want to keep it as allow can be merged as one
> of the static_content patterns. Might be good to call static_content
> slightly different after that.
> 

I'll consider your suggestions and clean-up in here too.

> > external_acl_type is_cacheable_type children=5 %{Cookie:__ac}
> %{Cookie:;__ac} %{Authorization} %{If-None-Match}
> /var/XXDIR/bin/squidAcl.py
> > acl is_cacheable external is_cacheable_type
> > no_cache allow is_cacheable
> 
> What exactly is that helper doing if I may ask?
> 

It's disabling cache for loged in users (based on cookies).

> > no_cache deny all
> 
> Hmm, you wanted performance. Thats usually gained by increasing the
> amount cached and thus reducing network distance to client and server
> load.
> 
> If this was done to prevent drive-by attacks poisoning the cache the
> conversion to proper reverse-proxy "accel" config will fix that.
> 
> If this was done due to the web servers output you may gain by
> inverting
> the approach here to what is the intended use of "cache". Caching
> everything but allowing explicit denial of badness where known.

After switching to "accel" config I will consider inverting the approach. 

> 
> 
> > negative_ttl 0
> > refresh_pattern . 0 50% 999999 ignore-reload
> > refresh_pattern -i /getFile$ 60 90% 3600
> 
> The "." refresh_pattern will match *everything*. Your custom patterns
> need to be placed above it to have any effect.
> 
> Also very large numbers in the min/max will 32-bit wrap when multiplied
> up to a timestamp and end up doing the opposite of what you want. It's
> not good in general to cache for more than a year so they should be set
> to 525600 or less.
> 
> ie:
>    refresh_pattern -i /getFile$ 60 90% 3600
>    refresh_pattern -i (/cgi-bin/|\?) 0 0% 0  ignore-reload
>    refresh_pattern . 0 50% 525600 ignore-reload
> 
> NP: If you can upgrade to 3.1+ you gain the "accel ignore-cc" option
> combo on http_port which overrides all the possible client-sent
> controls, not just the reload.
> 

Upgrading is something I really want to do. Unfortunately every time I try to upgrade (even minor versions of 2.6) squid will take 100%cpu and stop responding after a while (could be 5min could be 5h). I intend to try again after taking squirm out of the equation. 

> 
> > shutdown_lifetime 1 seconds
> > pipeline_prefetch on
> >
> > *******************************************************************
> >
> > The other squid will have a very similar config, just replace
> 192.168.2.1 with 192.168.2.2 and vice-versa.
> >
> > The main problem I'm facing is that every time the squid on the
> "passive" member responds with UDP_HIT the following line will be a
> TCP_MISS/504. Like this:
> >
> > 1286468808.210      0 192.168.2.1 UDP_HIT/000 168 ICP_QUERY
> http://127.0.0.1:3128/path/to/object - NONE/- -
> > 1286468808.721      4 192.168.2.1 TCP_MISS/504 1915 GET
> http://127.0.0.1:3128/path/to/object - NONE/- text/html
> 
> Are these logs lines from 192.168.2.1 or 192.168.2.2?

They are from 192.168.2.2

> 
> If they are recorded on 192.168.2.1 they show a loop as it fetches from
> itself and fails badly. The thing about loops is that they can hold up
> a
> lot of resources for a long time before stopping and being logged.
> 
> If they are recorded on 192.168.2.2, I expect they are just showing ICP
> false-positivies. ICP is known to be limited in the things it can match
> on. ie just the URL. Vary headers are a big problem when matching. You
> could disable the use of ICP entirely and use the round-robin.

ICP was working best from this setup. The parent webservers are slow and single threaded. It was important to try not to pass requests to a busy server and ICP did this better than anything else we've tried.

> 
> You will need a newer Squid to get better accuracy than ICP. One which
> supports HTCP and has more HTTP/1.1 compliant caching behaviour. HTCP
> will also let you use the nifty recursive HTCP CLR instead of HTTP
> PURGE.
> 

HTCP CLR would be niiiice, we could use that :)
However zope has no HTCP implementation that I am aware off (ICP is implemented just for speed of response but it is there)

> Also note how Squid is informing the web server that it's domain name
> is
> "127.0.0.1:3128". This is due to lack of the "accel vhost" options on
> http_port.
> 

Actually we intended to have the same host in the url so the URL's are the same in the two squids. Don't realy know how to use accel vhost but I will read some more on the subject. 

> >
> > I've searched this list and internet in general for ideas of what I'm
> doing wrong and came up empty.
> >
> > I'm open to any suggestion for improvement in this setup. Performance
> is my main goal.
> >
> > Many thanks,
> > Adrian
> >
> 
> HTH
> Amos
> --
> Please be using
>    Current Stable Squid 2.7.STABLE9 or 3.1.8
>    Beta testers wanted for 3.2.0.2

Thank you Amos for your time and valuable expertise. 

Best regards,
Adrian