Search squid archive

Re: detecting dead parent problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/05/2013 3:16 a.m., Rietzler, Markus (RZF, SG 324 / <RIETZLER_SOFTWARE>) wrote:
we have a setup with one squid (user-proxy) that connects to 4 parent proxies.

cache_peer proxy-inter1 parent 8083 0 sourcehash no-query no-digest no-netdb-exchange connection-auth=off
cache_peer proxy-inter2 parent 8083 0 sourcehash no-query no-digest no-netdb-exchange connection-auth=off
cache_peer proxy-inter3 parent 8083 0 sourcehash no-query no-digest no-netdb-exchange connection-auth=off
cache_peer proxy-inter4 parent 8083 0 sourcehash no-query no-digest no-netdb-exchange connection-auth=off

recently two of those 4 parents were gone. in cache log we saw messages like:

2013/05/06 16:27:33 TCP connection to proxy-inter4/8083 failed

and then after 10s or so (which should be the dead_parent_timeout)

2013/05/06 16:27:34 Detected DEAD Parent: proxy-inter4

that seems to be normal.

BUT
1) those messages reappear in cache.log again and again. normally we would expect them not to come at all unless the parent is detected as live again. many "TCP connection failed" and some times "DEAD parents"
2) browsing the web was extremely SLOW

we use squid 3.2.4 as user-proxy and the 4 parent proxies.

configure options:  '--enable-auth-basic=MSNT,SMB' '--enable-external-acl-helpers=ldap_group' '--enable-auth-basic' '--enable-auth-ntlm' '--enable-auth-negotiate=kerberos' '--enable-delay-pools' '--enable-follow-x-forwarded-for' '--enable-removal-policies=lru,heap' '--with-filedescriptors=4096' '--with-winbind' '--with-async-io' '--enable-storeio=ufs,aufs,diskd,rock' '--disable-ident-lookups' '--prefix=/www/squid' '--enable-underscores' '--with-large-files' 'PKG_CONFIG_PATH=/opt/gnome/lib64/pkgconfig:/opt/gnome/share/pkgconfig' --enable-ltdl-convenience

top on the two living parent proxies was ok.


we also have two development systems. one running squid 2.7.3 and one 3.2.9. the one with 3.2.9 showed some problems. many log entries in cache log and SLOW browsing. on the old squid browsing was no problem at all. all requests were fast enough. the old squid showed no messages in cache log after "DEAD parent". on both development systems only few (2-3) users were active.


any idea were to look?

Start with removing "no-query" from the cache_peer lines. The one of the main purposes of proxy queries is to determine UP/DEAD status. You can also tune the connection-fail-limit= option on cache_peer to reduce the number of failed requests before the peer is declared DEAD.

FYI: 3.2 forwarding path algorithm has been altered a fair bit in a way which might account for the behaviour change. Namely DNS is only looked up once per path available, and re-tries are done sequentially down the resulting set of IPs - 3.1 and older would do DNS lookups on every re-try so you would easily get the 10 failed connects in a few ms while retrying a single request which never gets through. In 3.2 you will get 10 *different* requests trying the peer over a slightly longer time (better chance of short-outage recovery detection) and getting serviced by a later path (hopefully more successful, and definitely less lag on errors than before).

You are using sourcehash, which is an algorithm that only produces *1* cache_peer as available for servicing a request. The behaviour change above will result in only that peer IPs being tested on a request before other paths like DIRECT/DNS being tried. The hash will *not* be re-calculated for the request which failed to reach the peer.

Amos




[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux