Hi.
I'm using squid 3.1.x on FreeBSD. Squid is built from ports.
Recently I was hit by a weird issue: my users cannot open HTTPS pages.
This is not something constant - if they hit the F5 button in browser,
the pages are loading, sometimes showing the message like 'Unable to
connect. Firefox can't establish a connection to the server at
access.ripe.net.' (for example. most of them are using FF). In the same
time plain HTTP pages are working fine.
I did some investigation and it appears like squid really thinks it
cannot connect to HTTPS-enabled web server:
===Cut===
2012/03/13 14:08:39.661| ACL::ChecklistMatches: result for 'all' is 1
2012/03/13 14:08:39.661| ACLList::matches: result is true
2012/03/13 14:08:39.661| aclmatchAclList: 0x285e4810 returning true (AND
list satisfied)
2012/03/13 14:08:39.661| ACLChecklist::markFinished: 0x285e4810
checklist processing finished
2012/03/13 14:08:39.661| ACLChecklist::check: 0x285e4810 match found,
calling back with 1
2012/03/13 14:08:39.661| ACLChecklist::checkCallback: 0x285e4810 answer=1
2012/03/13 14:08:39.661| peerCheckAlwaysDirectDone: 1
2012/03/13 14:08:39.661| peerSelectFoo: 'CONNECT access.ripe.net'
2012/03/13 14:08:39.661| peerSelectFoo: direct = DIRECT_YES
2012/03/13 14:08:39.661| The AsyncCall SomeCommConnectHandler
constructed, this=0x286e6740 [call1916]
2012/03/13 14:08:39.661| commConnectStart: FD 14, cb 0x286e6740*1,
access.ripe.net:443
2012/03/13 14:08:39.661| The AsyncCall SomeCloseHandler constructed,
this=0x2956c2c0 [call1917]
2012/03/13 14:08:39.661| ipcache_nbgethostbyname: Name 'access.ripe.net'.
2012/03/13 14:08:39.661| ipcache_nbgethostbyname: HIT for 'access.ripe.net'
2012/03/13 14:08:39.662| ipcacheMarkBadAddr: access.ripe.net
[2001:67c:2e8:22::c100:685]:443
2012/03/13 14:08:39.662| ipcacheCycleAddr: access.ripe.net now at
193.0.6.133 (2 of 2)
2012/03/13 14:08:39.662| commResetFD: Reset socket FD 14->16 : family=28
2012/03/13 14:08:39.662| commResetFD: Reset socket FD 14->16 : family=28
2012/03/13 14:08:39.662| FilledChecklist.cc(168) ~ACLFilledChecklist:
ACLFilledChecklist destroyed 0x285e4810
2012/03/13 14:08:39.662| ACLChecklist::~ACLChecklist: destroyed 0x285e4810
2012/03/13 14:08:39.662| FilledChecklist.cc(168) ~ACLFilledChecklist:
ACLFilledChecklist destroyed 0x285e4910
2012/03/13 14:08:39.662| ACLChecklist::~ACLChecklist: destroyed 0x285e4910
2012/03/13 14:08:39.662| The AsyncCall SomeCommReadHandler constructed,
this=0x28ce9100 [call1918]
2012/03/13 14:08:39.662| leaving SomeCommReadHandler(FD 150,
data=0x286b6710, size=4, buf=0x28d1e000)
2012/03/13 14:08:39.662| ipcache_nbgethostbyname: Name 'access.ripe.net'.
2012/03/13 14:08:39.662| ipcache_nbgethostbyname: HIT for 'access.ripe.net'
2012/03/13 14:08:39.662| commResetFD: Reset socket FD 14->16 : family=28
2012/03/13 14:08:39.662| commResetFD: Reset socket FD 14->16 : family=28
2012/03/13 14:08:39.662| ipcacheCycleAddr: access.ripe.net now at
193.0.6.133 (2 of 2)
2012/03/13 14:08:39.662| ipcache_nbgethostbyname: Name 'access.ripe.net'.
2012/03/13 14:08:39.662| ipcache_nbgethostbyname: HIT for 'access.ripe.net'
2012/03/13 14:08:39.662| ipcacheMarkAllGood: Changing ALL
access.ripe.net addrs to OK (1/2 bad)
2012/03/13 14:08:39.662| commConnectCallback: FD 14
2012/03/13 14:08:39.662| comm.cc(1195) commSetTimeout: FD 14 timeout -1
2012/03/13 14:08:39.662| comm.cc(1206) commSetTimeout: FD 14 timeout -1
2012/03/13 14:08:39.662| comm.cc(934) will call
SomeCommConnectHandler(FD 14, errno=22, flag=-8, data=0x28f6bdd0, )
[call1916]
2012/03/13 14:08:39.662| commConnectFree: FD 14
2012/03/13 14:08:39.662| entering SomeCommConnectHandler(FD 14,
errno=22, flag=-8, data=0x28f6bdd0, )
2012/03/13 14:08:39.662| AsyncCall.cc(32) make: make call
SomeCommConnectHandler [call1916]
2012/03/13 14:08:39.662| errorSend: FD 12, err=0x28f995d0
2012/03/13 14:08:39.662| errorpage.cc(1051) BuildContent: No existing
error page language negotiated for ERR_CONNECT_FAIL. Using default error
file.
===Cut==
But why ? I did some telnetting from this server to the
access.ripe.net:443, and it succeeded like 10 from 10 times (squid error
rate is far more frequent). The only thing that bothers me is that
telnet also first tries ipv6 too, but then switches to the ipv4, and
connects.
Now some suggestions (probably a shot in the dark). This only happens on
an ipv6-enabled machines, but without actual ipv6 connectivity (no ipv6
default route or no public ipv6 address, for example I have unique-local
addresses for the testing purposes). In the same time this issue can be
easily solved by restoring the ipv6 connectivity to the outer world. So,
can it be some dual-stack behavior bug ? Or is it 'by design' ? Do I
need to report it ?
Thanks.
Eugene.