On Thu, Oct 27, 2011 at 04:37:20PM +1030, Brett Lymn wrote: > > OK, but, the 2.7 stable 6 machines that work well share the same parents > as the 3.1.15 machines - they even talk to the same DNS servers. > I had a bit of a dig at this on the weekend and can confirm that the problem is a DNS issue and is a combination of broken DNS and the way squid does lookups. It looks like the new directive in 3.1.16 would help in this case. What looks to be happening is that squid never tries to look up the A address, the remote server just times out on the AAAA lookup but it takes so long that the timeout clobbers the DNS request in the queue. I see this on a tcpdump: 192.168.3.3.65473 > 192.231.203.132.domain: [udp sum ok] 10060+ AAAA? www.my.commbank.com.au. (40) 19:38:25.132968 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 68) 192.168.3.3.65472 > 192.231.203.3.domain: [udp sum ok] 10060+ AAAA? www.my.commbank.com.au. (40) 19:38:30.154854 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 68) 192.168.3.3.65473 > 192.231.203.132.domain: [udp sum ok] 10060+ AAAA? www.my.commbank.com.au. (40) 19:38:31.177449 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 68) 192.168.3.3.65472 > 192.231.203.3.domain: [udp sum ok] 10060+ AAAA? www.my.commbank.com.au. (40) 19:38:36.197481 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 68) 192.168.3.3.65473 > 192.231.203.132.domain: [udp sum ok] 10060+ AAAA? www.my.commbank.com.au. (40) 19:38:37.217890 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 68) 192.168.3.3.65472 > 192.231.203.3.domain: [udp sum ok] 10060+ AAAA? www.my.commbank.com.au. (40) And this in the cache.log with debug_options 78,3: 2011/10/30 19:22:57.089| idnsRead: starting with FD 11 2011/10/30 19:22:57.089| idnsRead: FD 11: received 40 bytes from 192.231.203.3:53 2011/10/30 19:22:57.089| idnsGrokReply: ID 0xcbe0, -2 answers 2011/10/30 19:22:57.089| idnsGrokReply: error Server Failure: The name server was unable to process this query. (2) 2011/10/30 19:22:57.089| idnsGrokReply: Query result: SERV_FAIL 2011/10/30 19:23:58.160| idnsCheckQueue: ID 0x54bftimeout 2011/10/30 19:24:58.996| idnsCheckQueue: ID 0x54bftimeout 2011/10/30 19:24:58.996| idnsCheckQueue: ID 54bf: giving up after 4 tries and 121.91 seconds In the code I can see that the A record is supposed to be tried after a SERV_FAIL has happened a few times but in this case the retries take so long the DNS request gets killed out of the queue before that part of the code is executed. What I eventually did at home was rebuild squid with --disable-ipv6 (actually, it would be nice if this was a config directive rather than compile time....). Once I had done this the comm bank site was actually reasonably useable since the AAAA lookups were not being tried at all. -- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."