Search squid archive

Re: Exchange 2010 and 502 Bad Gateway

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 23/08/2013 8:18 p.m., Bill Houle wrote:
For the next in my continuing Exchange saga, let's talk 502 errors. I've got a couple different instances.

1) ActiveSync sends periodic 'Ping' requests to implement its "server push" feature. If I understand the process correctly, the client sends an empty (Content-Length: 0) keep-alive HTTP request and tries to see how long the server+network honor the session.

potential problem #1: what type of keep-alive request? the old HTTP/1.0 "Keep-Alive:" header is deprecated, not supported by Squid and does not actually work most places anyway. Simply opening a TCP connection and waiting after the first ping request until it closes is a terrible thing to test it.


It uses a back-off algorithm to eventually settle on a timing value that it knows the network can support: if the keep-alive expires cleanly, they up the ante and repeat; if the HTTP session aborts, they drop it down to the previous success and lock in the refresh rate. From that point forward, they've got a sync window and continue to issue Pings at that duration. That way, if the Ping aborts, it is a signal that a 'Sync' is needed because "server push" has new data.

potential problem #2: are they using HTTP/1.1 1xx status codes from the server as this sync ping or HTTP/1.0 simple request/reply pairs? Squid older than 3.2 do not support the 1xx status response. So is there any HTTP/1.0 software along the network path? (including Squid up to version 3.1).

What I'm actually seeing is that the system is never able to settle on a consistent keep-alive sync window as MS might like. The Ping, or string of Pings, might last minutes or could only be seconds. When the Ping ultimately fails, the system does a Sync even though there may be nothing new. The end result is that it is less like "server push" and more like polling at a variable rate.

This is where we come back to the whole design of this being a terrible way to operate. They are trying to measure the unbalanced cycles of TCP socket timeout on every box along the pathway, NAT record timeout on every NAT relay along the pathway, idle connection timeout on every proxy along the pathway. Simultaneously.


The users don't really notice or care since they still get their updates promptly. It's hardly catastrophic for me, but I could envision that the variable-polling behavior might be slightly more taxing as the number of users scale upward. But I'm curious if there's any Squid debug I can add that might reveal why the session durations seem to vary so much? At 11,2 level, the only thing I see is:

2013/08/19 00:46:51 kid1| WARNING: HTTP: Invalid Response: No object data received forhttps://mail.domain.com/Microsoft-Server-ActiveSync?User=user&DeviceId=ApplF4KKR4GLF199&DeviceType=iPad&Cmd=Ping AKA mail.domain.com/Microsoft-Server-ActiveSync?User=user&DeviceId=ApplF4KKR4GLF199&DeviceType=iPad&Cmd=Ping

To which Squid replies back to the client as 502 Bad Gateway. X-Squid-Error is ERR_ZERO_SIZE_OBJECT.

It will be more taxing as the numbers of users increase. These connections are long-term, blocked from use by the client end, and reserving 2 TCP sockets and an 1 disk FD on the proxy for every connection.

No there is no easy way to debug why the variance in connection length exists. You need wireshark or similar with a packet trace to identify where the close is coming from. that Squid message indicates that something between Squid and the server is cutting the connection.


2) Next problem is OWA (WebMail). OWA is designed to mimic Outlook, so if Outlook can support 10Meg attachments, so can OWA. A user tries to send a large attachment. Unlike the ActiveSync problem I previously posted about, UploadReadAhead does not seem to enter into the equation - possibly because the POST is redirected to an /EWS/ proxy. It happily chunks well past the ActiveSync threshold, but at some point the connection may still fail:

2013/08/21 07:41:07.616 kid1| http.cc(1172) readReply: local=proxy.IP:42891 remote=Exchange.IP:443 FD 39 flags=1: read failure: (32) Broken pipe.

To which Squid replies back to the client as 502 Bad Gateway. X-Squid-Error is ERR_READ_ERROR 104.

I know Squid doesn't touch the data, and thus doesn't care about transaction size. But is there anything more I can do to minimize all possible drops & connection timeouts, particularly with large POSTs? I'm not saying the drops are Squid's fault, I just want to idiot-proof the setup on this end as much as possible.

This sounds like a bug in Exchange itself. The HTTP protocol offers chunked encoding to get around this type of error and Squid will be sending it whenever necessary and possible. But that relies on the other end working right. There is nothing that can be done about POST if the server is broken.

3) Final example is RPC-over-HTTPS. I routinely see 502s on "connection reset by peer" (RSTs seem to be par for the course on Windows systems). But I've also seen ERR_READ_ERROR 104 on a "No error" error.

2013/08/19 21:09:37.239 kid1| http.cc(1172) readReply: local=proxy.IP:58798 remote=Exchange.IP:443 FD 44 flags=1: read failure: (0) No error..

What could this possibly indicate?

Strange but no unheard of. Something in the asynchronous even handling overwrote the global error detail before Squid could pick it up.

Amos





[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux