Re: Re: icp_query_timeout directive is not working in 3.3.8 for some reason

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Sun, 04 Aug 2013 14:25:16 +1200

On 3/08/2013 10:45 p.m., x-man wrote:
Hi Amos,

I think this request time is the time needed to serve the entire request.

It is.

How is icp_query_timeout related to that, it should be only about the query
through ICP protocol?

When determining which sibling peer can be used for the HTTP fetch ICP 
is one of the methods of lookup to check the peer has the content.
It needs to be waited on for before making the decision on that peer. 
Two such peer lookups with your timeout would account for up to 18000 ms 
of that request service time.

It is unclear from both the log and the code whether the TIMEOUT_ part 
is coming from the peer you got connected to or some earlier attempted 
peer. I suspect that it is coming from some earlier attempt to identify 
a peer.

Otherwise we are using our own cache peer which is dealing with the youtube
content, which supports ICP protocol, it's connected to squid as cache peer
and the squid (based on ACL) is sending youtube requests to the cache peer.

I'm comparing squid 3.1.9 and 3.3.8 and what I notice is that without
changing any other element of the system

with icp_query_timeout 9000 set for both test cases,

with squid 3.1 I don't get any TIMEOUT_FIRSTUP_PARENT in the access.log, and
with squid 3.3.8 I'm getting lot's of them and this is reducing our
performance.

Please suggest what can be the difference and what I can check further.

The big difference in this area between those two versions is that we 
reorganised the sequence of operations request forwarding does to 
include DNS lookups for the possible outgoing routes. The result is that 
peers are now guaranteed to only get tried once each and each of their 
IPs will be tried only once each as well, with tcp_outgoing_address 
working properly regardless of the IP addressing method.

It is possible that both proxies ICP queries are getting timed out, but 
simply that the 3.1 is locating a "usable" peer fast enough to have 
already moved past that step to the DNS which was done separately 
before. With the DNS queries now within that peer selection stage the 
3.3 could be delayed long enough for the ICP results to get marked on 
the transaction. Meaning that success/failure for both versions was 
unchanged just the log slightly more accurate.

It is also possible that the improved HTTP/1.1 support placing a larger 
request load on the parent proxy or network traffic (via traffic moving 
faster). Not much can be done about that.

To solve this I think there are two easy ways you can go forward:
1) figureing out why the peer needs the timout at all an fixing that 
problem (could be CPU bound at the peer, network traffic congestion, or 
excessive buffering in the network)

2) If this is a setup simply for youtube caching by the parent peer 
proxy then I suggest you take a look at moving to 3.4 series amd taking 
advantage of the Store-ID feature there which is an improved version of 
the Store-URL feature 2.7 provided.

Amos