On Mon, Oct 27, 2008 at 2:56 PM, Henrik Nordstrom <henrik@xxxxxxxxxxxxxxxxxxx> wrote: > On mån, 2008-10-27 at 12:23 -0700, Neil Harkins wrote: >> The timeout is because the Content-Length header is bigger than the >> payload it sent. >> Every http client/server will hang in that situation. This isn't >> simply a misreported >> HIT<->MISS in the log, this is absolutely a significant bug where >> collapsed forwarding is >> mixing up the metadata from the two branches of our Vary: >> Accept-Encoding (gzip and not), >> i.e. giving the headers and content as non-gzip, but the amount of >> payload it reads from >> the cache and sends is based on the gzip size. Disabling >> collapsed_forwarding fixed it. > > Please file a bug report on this. Preferably including "squid -k debug" > cache.log output and "tcpdump -s 0 -w traffic.pcap" traces. I'd like to help and see this get fixed, but as I said earlier, it happens on about 16% of our test requests, only when there's 750~1050 reqs/second going through the box, and pretty much disappears under 500 reqs/s (off-peak). I managed to catch only 1 instance of the timeout in a 17 Gig cache.log with debugging enabled. Is this except significant?: 2008/10/24 17:09:23| storeLocateVaryRead: accept-encoding="gzip" 0x3b28978 seen_offset=204 buf_offset=0 size=85 2008/10/24 17:09:23| storeLocateVaryRead: Key: 968D51EAA0C2BCF5688EAB92E8F56EE4 2008/10/24 17:09:23| storeLocateVaryRead: 0x3b28978 seen_offset=289 buf_offset=0 2008/10/24 17:09:23| storeClientCopy: 4F3F9F8F4461796C3C469E610B6E2D5C, seen 289, want 289, size 4096, cb 0x46169d, cbdata 0x3b28978 [snip] 2008/10/24 17:09:23| storeLocateVaryRead: accept-encoding="" 0x3b28978 seen_offset=204 buf_offset=0 size=178 2008/10/24 17:09:23| storeLocateVaryRead: Key: 968D51EAA0C2BCF5688EAB92E8F56EE4 2008/10/24 17:09:23| storeLocateVaryRead: MATCH! 968D51EAA0C2BCF5688EAB92E8F56EE4 (null) 2008/10/24 17:09:23| storeClientCopy: 4F3F9F8F4461796C3C469E610B6E2D5C, seen 382, want 382, size 4096, cb 0x46169d, cbdata 0x3b28978 Note that I've since changed our load balancer to rewrite "Accept-Encoding: " to "Accept-Encoding: identity" in case squid didn't like a null header, (although the example in the RFC implies that "Accept-Encoding: " is valid: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3) but the timeouts still happened, although I didn't grab debugging then. -neil