On 23/08/2013 8:18 p.m., Bill Houle wrote:
For the next in my continuing Exchange saga, let's talk 502 errors.
I've got a couple different instances.
1) ActiveSync sends periodic 'Ping' requests to implement its "server
push" feature. If I understand the process correctly, the client sends
an empty (Content-Length: 0) keep-alive HTTP request and tries to see
how long the server+network honor the session.
potential problem #1: what type of keep-alive request? the old HTTP/1.0
"Keep-Alive:" header is deprecated, not supported by Squid and does not
actually work most places anyway. Simply opening a TCP connection and
waiting after the first ping request until it closes is a terrible thing
to test it.
It uses a back-off algorithm to eventually settle on a timing value
that it knows the network can support: if the keep-alive expires
cleanly, they up the ante and repeat; if the HTTP session aborts, they
drop it down to the previous success and lock in the refresh rate.
From that point forward, they've got a sync window and continue to
issue Pings at that duration. That way, if the Ping aborts, it is a
signal that a 'Sync' is needed because "server push" has new data.
potential problem #2: are they using HTTP/1.1 1xx status codes from the
server as this sync ping or HTTP/1.0 simple request/reply pairs?
Squid older than 3.2 do not support the 1xx status response. So is there
any HTTP/1.0 software along the network path? (including Squid up to
version 3.1).
What I'm actually seeing is that the system is never able to settle on
a consistent keep-alive sync window as MS might like. The Ping, or
string of Pings, might last minutes or could only be seconds. When the
Ping ultimately fails, the system does a Sync even though there may be
nothing new. The end result is that it is less like "server push" and
more like polling at a variable rate.
This is where we come back to the whole design of this being a terrible
way to operate.
They are trying to measure the unbalanced cycles of TCP socket timeout
on every box along the pathway, NAT record timeout on every NAT relay
along the pathway, idle connection timeout on every proxy along the
pathway. Simultaneously.
The users don't really notice or care since they still get their
updates promptly. It's hardly catastrophic for me, but I could
envision that the variable-polling behavior might be slightly more
taxing as the number of users scale upward. But I'm curious if there's
any Squid debug I can add that might reveal why the session durations
seem to vary so much? At 11,2 level, the only thing I see is:
2013/08/19 00:46:51 kid1| WARNING: HTTP: Invalid Response: No object
data received
forhttps://mail.domain.com/Microsoft-Server-ActiveSync?User=user&DeviceId=ApplF4KKR4GLF199&DeviceType=iPad&Cmd=Ping
AKA
mail.domain.com/Microsoft-Server-ActiveSync?User=user&DeviceId=ApplF4KKR4GLF199&DeviceType=iPad&Cmd=Ping
To which Squid replies back to the client as 502 Bad Gateway.
X-Squid-Error is ERR_ZERO_SIZE_OBJECT.
It will be more taxing as the numbers of users increase. These
connections are long-term, blocked from use by the client end, and
reserving 2 TCP sockets and an 1 disk FD on the proxy for every connection.
No there is no easy way to debug why the variance in connection length
exists. You need wireshark or similar with a packet trace to identify
where the close is coming from. that Squid message indicates that
something between Squid and the server is cutting the connection.
2) Next problem is OWA (WebMail). OWA is designed to mimic Outlook, so
if Outlook can support 10Meg attachments, so can OWA. A user tries to
send a large attachment. Unlike the ActiveSync problem I previously
posted about, UploadReadAhead does not seem to enter into the equation
- possibly because the POST is redirected to an /EWS/ proxy. It
happily chunks well past the ActiveSync threshold, but at some point
the connection may still fail:
2013/08/21 07:41:07.616 kid1| http.cc(1172) readReply:
local=proxy.IP:42891 remote=Exchange.IP:443 FD 39 flags=1: read
failure: (32) Broken pipe.
To which Squid replies back to the client as 502 Bad Gateway.
X-Squid-Error is ERR_READ_ERROR 104.
I know Squid doesn't touch the data, and thus doesn't care about
transaction size. But is there anything more I can do to minimize all
possible drops & connection timeouts, particularly with large POSTs?
I'm not saying the drops are Squid's fault, I just want to idiot-proof
the setup on this end as much as possible.
This sounds like a bug in Exchange itself. The HTTP protocol offers
chunked encoding to get around this type of error and Squid will be
sending it whenever necessary and possible. But that relies on the other
end working right. There is nothing that can be done about POST if the
server is broken.
3) Final example is RPC-over-HTTPS. I routinely see 502s on
"connection reset by peer" (RSTs seem to be par for the course on
Windows systems). But I've also seen ERR_READ_ERROR 104 on a "No
error" error.
2013/08/19 21:09:37.239 kid1| http.cc(1172) readReply:
local=proxy.IP:58798 remote=Exchange.IP:443 FD 44 flags=1: read
failure: (0) No error..
What could this possibly indicate?
Strange but no unheard of. Something in the asynchronous even handling
overwrote the global error detail before Squid could pick it up.
Amos