Re: Upgrade to 3.4.3 and TCP Connections to parent failing more often

Paul Carew <beavatronix@xxxxxxxxx> · Wed, 19 Feb 2014 12:37:05 +0000

I've been looking into this a bit further and have found the following
debug information:

2014/02/19 10:58:13.021 kid1| AsyncCall.cc(85) ScheduleCall:
ConnOpener.cc(132) will call fwdConnectDoneWrapper(local=0.0.0.0
remote=192.168.1.10:8080 flags=1, errno=110, flag=-4, data=0x90db7c88)
[call34778883]
2014/02/19 10:58:13.021 kid1| AsyncCallQueue.cc(51) fireNext: entering
fwdConnectDoneWrapper(local=0.0.0.0 remote=192.168.1.10:8080 flags=1,
errno=110, flag=-4, data=0x90db7c88)
2014/02/19 10:58:13.021 kid1| AsyncCall.cc(30) make: make call
fwdConnectDoneWrapper [call34778883]
2014/02/19 10:58:13.021 kid1| FwdState.cc(402) fail: ERR_CONNECT_FAIL
"Service Unavailable"
2014/02/19 10:58:13.021 kid1| TCP connection to
wwwproxy01.domain.local/8080 failed
2014/02/19 10:58:13.021 kid1| FwdState.cc(609) retryOrBail:
re-forwarding (0 tries, 30 secs)
2014/02/19 10:58:13.021 kid1| FwdState.cc(373) startConnectionOrFail:
http://webmail.tiscali.co.uk/cp/ps/Mail/commands/SyncFolder?d=tiscali.co.uk&u=firstname.surname01&t=29000
2014/02/19 10:58:13.021 kid1| FwdState.cc(1080) connectStart:
fwdConnectStart:
http://webmail.tiscali.co.uk/cp/ps/Mail/commands/SyncFolder?d=tiscali.co.uk&u=firstname.surname01&t=29000
2014/02/19 10:58:13.021 kid1| FwdState.cc(1203) connectStart:
fwdConnectStart: got outgoing addr 0.0.0.0, tos 0
2014/02/19 10:58:13.021 kid1| AsyncCall.cc(18) AsyncCall: The
AsyncCall fwdConnectDoneWrapper constructed, this=0x8e0de0b0
[call34803025]
2014/02/19 10:58:13.021 kid1| AsyncCallQueue.cc(53) fireNext: leaving
fwdConnectDoneWrapper(local=0.0.0.0 remote=192.168.1.10:8080 flags=1,
errno=110, flag=-4, data=0x90db7c88)
2014/02/19 10:58:13.022 kid1| AsyncCall.cc(85) ScheduleCall:
ConnOpener.cc(132) will call
fwdConnectDoneWrapper(local=192.168.0.10:56359
remote=192.168.1.10:8080 FD 97 flags=1, data=0x90db7c88)
[call34803025]
2014/02/19 10:58:13.022 kid1| AsyncCallQueue.cc(51) fireNext: entering
fwdConnectDoneWrapper(local=192.168.0.10:56359
remote=192.168.1.10:8080 FD 97 flags=1, data=0x90db7c88)
2014/02/19 10:58:13.022 kid1| AsyncCall.cc(30) make: make call
fwdConnectDoneWrapper [call34803025]
2014/02/19 10:58:13.022 kid1| FwdState.cc(1027) connectDone:
local=192.168.0.10:56359 remote=192.168.1.10:8080 FD 97 flags=1:
'http://webmail.tiscali.co.uk/cp/ps/Mail/commands/SyncFolder?d=tiscali.co.uk&u=firstname.surname01&t=29411'
2014/02/19 10:58:13.022 kid1| FwdState.cc(1216) dispatch:
local=192.168.0.10:8080 remote=10.133.49.121:4775 FD 89 flags=1:
Fetching 'POST http://webmail.tiscali.co.uk/cp/ps/Mail/commands/SyncFolder?d=tiscali.co.uk&u=firstname.surname01&t=29000'
2014/02/19 10:58:13.022 kid1| AsyncCallQueue.cc(53) fireNext: leaving
fwdConnectDoneWrapper(local=192.168.0.10:56359
remote=192.168.1.10:8080 FD 97 flags=1, data=0x90db7c88)

Judging by the errno=110 it looks like I'm getting a "Connection timed
out" from the parent for some requests. As I can't find a
corresponding error in the logs on the parent that sounds quite
likely.

However, networking all looks good on both sides.

Initiating server:

bond0     Link encap:Ethernet  HWaddr 00:26:55:7D:90:14
          inet addr:192.168.0.10  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:48264326 errors:0 dropped:0 overruns:0 frame:0
          TX packets:50472917 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:31676495378 (29.5 GiB)  TX bytes:37324524265 (34.7 GiB)

eth0      Link encap:Ethernet  HWaddr 00:26:55:7D:90:14
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:48143308 errors:0 dropped:0 overruns:0 frame:0
          TX packets:50472917 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:31668436400 (29.4 GiB)  TX bytes:37324524265 (34.7 GiB)

eth1      Link encap:Ethernet  HWaddr 00:26:55:7D:90:14
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:121018 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8058978 (7.6 MiB)  TX bytes:0 (0.0 b)

Parent server:

bond0     Link encap:Ethernet  HWaddr AC:16:2D:76:4C:24
          inet addr:192.168.1.10  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:100957917 errors:0 dropped:0 overruns:0 frame:0
          TX packets:110178407 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:74405340552 (69.2 GiB)  TX bytes:74282360105 (69.1 GiB)

eth0      Link encap:Ethernet  HWaddr AC:16:2D:76:4C:24
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:100289600 errors:0 dropped:0 overruns:0 frame:0
          TX packets:110178407 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:74354605104 (69.2 GiB)  TX bytes:74282360105 (69.1 GiB)
          Interrupt:32

eth3      Link encap:Ethernet  HWaddr AC:16:2D:76:4C:24
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:668317 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:50735448 (48.3 MiB)  TX bytes:0 (0.0 b)
          Interrupt:36

The only other I can think is that the issue is being caused by a
layer 3 device between the servers?

Thanks

Paul

On 17 February 2014 14:56, Paul Carew <beavatronix@xxxxxxxxx> wrote:
> Hi
>
> I have recently upgraded our Squid servers from 3.3.11 to 3.4.3 and am
> seeing the following error every few minutes in the cache log.
>
> 2014/02/17 13:43:02 kid1| TCP connection to wwwproxy02.domain.local/8080 failed
>
> I have 2 servers configured on the LAN which handle connections over a
> private WAN and 2 other servers on another WAN connected to the
> internet. The first 2 servers use the second pair of servers connected
> to the internet as a parent with the following lines in squid.conf:
>
> cache_peer wwwproxy01.domain.local parent 8080 0 no-query no-digest carp
> cache_peer wwwproxy02.domain.local parent 8080 0 no-query no-digest carp
>
> With 3.3.11 I occasionally got the error, maybe two or three times daily.
>
> Does anyone have any ideas why this might be occurring on 3.4.3 but
> not 3.3.11? I've had a look at debug_options but can't see a section
> that screams "debug me" for this particular error. Maybe section 11 or
> 15?
>
> Many Thanks
>
> Paul