Joe Freshman wrote:
I have been a long-time squid user (going on 10 years now), and have been experiencing an issue that I believe is squid's fault, and may cause me to drop squid entirely, because we have lost some customers due to this behavior. I run squid 2.6.STABLE6-5.el5_1 as a reverse proxy to two different web servers on a decently-spec'd server (OS is RHEL 5) that only runs squid (and iptables), and has a constant load of 1. The vast majority of the time, everything works fine. Sometimes, however, the following happens: * User tries to connect to one of the web sites via a browser, and either downloads some of the page elements, or none of the page elements. This is duplicable from that user's computer within a certain time window. Also, it is usually that the first exhibition of the problem is that some of the page elements download, and on retries, none of the page elements download. * If a user is experiencing this issue, if the user drops to a shell (or windows command-line), initiates a telnet session on port 80 to the same server (that the user was trying to hit in the browser), and enters a properly-formed HTTP request for the same page, squid responds by dropping the connection with a blank.
That is not normal behavior for Squid.
* I suspect that it is squid returning the response for the following reasons: (1) A squid log entry where the page was returned correctly looks like this: 1219668745.767 165 70.43.203.242 TCP_MISS/200 8056 GET http://wiki.myserver.com/index.php?title=Home_Page/Work_Center/Page_Title&action=edit - FIRST_UP_PARENT/74.213.131.84 text/html (2) A squid log entry where the page never made it to the browser looks like this: 1219665920.576 194 66.169.93.6 TCP_MISS/200 8056 GET http://wiki.myserver.com/index.php?title=Home_Page/Work_Center/Page_Title&action=edit - FIRST_UP_PARENT/74.213.131.84 text/html ... and so they both look pretty similar (3) Apache log entry for #1: 74.213.131.82 - - [25/Aug/2008:07:48:42 -0500] "GET /index.php?title=Home_Page/Work_Center/Page_Title&action=edit HTTP/1.0" 200 7558 (4) Apache log entry for #2: 74.213.131.82 - - [25/Aug/2008:07:02:49 -0500] "GET /index.php?title=Home_Page/Work_Center/Page_Title&action=edit HTTP/1.0" 200 7558 (5) If I use a different ISP (either VNC to my home, or get on the server running squid, or get on another server we administer halfway across the country), the same page loads fine. And yes, I can have a VNC session up while having the other web browser up, and hit "reload" on the page on both machines, and the one on the client exhibiting the issue will fail every time, and the one on the other machine will work every time.
Which proves its not Squid fault. Squid would fail inconsistently or uniformly.
Have you checked out the usual culprits: TCP extensions, ECN, and window-scaling issues in the client -> Squid link?
(6) Here are trace routes from two locations; the first is one that was working properly, the second from a client that was exhibiting the issue: Tracing route to wiki.myserver.com [74.213.131.85] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms 10.0.0.1 2 5 ms 6 ms 13 ms 10.116.240.1 3 8 ms 8 ms 7 ms 172.22.81.13 4 6 ms 6 ms 14 ms 172.22.33.161 5 9 ms 19 ms 9 ms 172.22.32.110 6 14 ms 12 ms 14 ms atl-edge-18.inet.qwest.net [216.206.221.149] 7 13 ms 11 ms 21 ms atl-core-01.inet.qwest.net [205.171.21.161] 8 27 ms 39 ms 28 ms cer-core-01.inet.qwest.net [67.14.8.202] 9 27 ms 28 ms 26 ms chp-brdr-02.inet.qwest.net [205.171.139.114] 10 27 ms 28 ms 27 ms ber1-ge-7-4.chicagoequinix.savvis.net [208.173.180.25] 11 27 ms 26 ms 31 ms ber1-vlan-241.chicago.savvis.net [204.70.196.21] 12 * 29 ms 26 ms cr1-tengig-0-0-5-0.chicago.savvis.net [204.70.195.113] 13 40 ms 39 ms 39 ms 204.70.200.86 14 43 ms 39 ms 42 ms acr2-so-4-0-0.washington.savvis.net [204.70.196.182] 15 37 ms 40 ms 39 ms iar2-loopback.Washington.savvis.net [206.24.226.13] 16 51 ms 47 ms 48 ms 208.174.125.110 17 52 ms 47 ms 49 ms cr2-g2-1.clt.hostedsolutions.com [216.27.69.226] 18 48 ms 49 ms 48 ms dr3-g6-1.clt.hostedsolutions.com [216.27.69.250] 19 50 ms 48 ms 50 ms shared-fw0.clt.hostedsolutions.com [216.27.72.227] 20 47 ms 47 ms 50 ms 74.213.131.85 Trace complete. Tracing route to wiki.myserver.com [74.213.131.85] over a maximum of 30 hops: 1 5 ms 3 ms 3 ms 192.168.2.1 2 10 ms 13 ms 15 ms 10.117.64.1 3 14 ms 13 ms 11 ms 172.22.81.9 4 18 ms 12 ms 20 ms 172.22.33.161 5 16 ms 38 ms 17 ms 172.22.33.34 6 18 ms 18 ms 19 ms atl-edge-18.inet.qwest.net [216.206.221.149] 7 24 ms 23 ms 22 ms atl-core-02.inet.qwest.net [205.171.21.165] 8 42 ms 37 ms 37 ms cer-core-02.inet.qwest.net [67.14.8.206] 9 40 ms 37 ms 40 ms chp-brdr-02.inet.qwest.net [205.171.139.118] 10 42 ms 37 ms 41 ms ber1-ge-7-4.chicagoequinix.savvis.net [208.173.180.25] 11 38 ms 38 ms 37 ms ber1-vlan-241.chicago.savvis.net [204.70.196.21] 12 40 ms 37 ms 36 ms cr1-tengig-0-0-5-0.chicago.savvis.net [204.70.195.113] 13 50 ms 49 ms 50 ms 204.70.200.90 14 50 ms 49 ms 50 ms acr1-so-5-0-0.washington.savvis.net [204.70.196.170] 15 51 ms 49 ms 49 ms iar2-loopback.Washington.savvis.net [206.24.226.13] 16 58 ms 61 ms 57 ms 208.174.125.110 17 56 ms 59 ms 59 ms cr1-g2-1.clt.hostedsolutions.com [216.27.69.222] 18 58 ms 57 ms 57 ms dr3-g5-1.clt.hostedsolutions.com [216.27.69.242] 19 62 ms 59 ms 57 ms shared-fw0.clt.hostedsolutions.com [216.27.72.227] 20 62 ms 70 ms 59 ms 74.213.131.85 Trace complete. Both trace routes are basically taking the same path (one is using Charter Cable residential; the other Charter Cable business). Can anyone help me with this issue? Why would squid "diss" users coming from one IP repeatedly?
Are you using TCP_RESET for any reason as an error page in squid.conf?
And if it isn't squid, why does squid log the requests, but just not give any response?
It's logging what it has sent successfully. There may be some middleware interception on the link fooling squid into thinking it sent.
If you think it's apache, how would apache be distinguishing from the two different users, since apache just sees the request coming from a single IP?
HTTP headers. But its probably not apache. Or the client would be getting error pages from Squid about a bad gateway.
Please also give me any other information on tests I could run to figure this issue out. I really want to keep running squid, but I'm pretty close to pulling the plug here. (By the way, this issue first appeared when we moved to these new servers, and we have had these issues from day one on this set of servers. We also installed the RHEL5 version of squid on those servers, which was a significant upgrade to the version we had been running).
Same servers same software before and after the move? My first call would be something different in the backbone network.
Same Servers new software? checkout the TCP linkages, see if anything wiffs. Someone else is likely to have a better idea about this than me though. Amos -- Please use Squid 2.7.STABLE4 or 3.0.STABLE8