Hi Andre, hi all: Let's have another try ;-) Suppose we have three Squid boxes in a cluster (let's call them A, B and C, respectively), all configured to talk to each other through ICP. Here is the problem we met: Client sent an HTTP request to A. A did not have the corresponding object in his local cache, so he queried B and C through ICP. Sibling B replied with an UDP_MISS, which was a normal behavior. What confused us was what machine C did: 1186606560.930 0 IP_OF_MACHINE_A UDP_HIT/000 115 ICP_QUERY http://www.example.com/dynamic.js - NONE/- - 1186606560.932 0 IP_OF_MACHINE_A TCP_MISS/504 1949 GET http://www.example.com/dynamic.js - NONE/- text/html What following UDP_HIT was a TCP_MISS/504, which means that machine C had that object in his local cache, but A failed to fetch it due to some weird timeout error. I'm not sure where this 504 came from, and I don't think it's a configuration problem, becase it was just 2 ms later than the corresponding UDP_HIT message, and I have never set any timeout related value to that extreme. Then, machine C released the object (504 error message instead of the expected content?) from memory: 1186606560.932 RELEASE -1 FFFFFFFF 381F892DF3928A903A3DF921D2FF27A9 504 1186606560 0 1186606560 text/html 1650/1896 GET http://www.example.com/dynamic.js Below are the corresponding logs from machine A: access.log: 1186606561.024 93 IP_OF_CLIENT_MACHINE TCP_MISS/200 10939 GET http://www.example.com/dynamic.js - DIRECT/IP_OF_BACKEND_SERVER application/x-javascript store.log: 1186606561.024 RELEASE -1 FFFFFFFF 9771AFBBB9036CA86486A7DE01F33538 200 1186606560 -1 1186649760 application/x-javascript -1/10675 GET http://www.example.com/dynamic.js Which means machine A fetched the object from backend server, served it to the requesting client, and then released it from memory *immediately*. Squid-2.5.STABLE14[1] on Linux 2.6.18-4-amd64; A, B and C are all connected to the same switch, so there is little chance for that to be a network problem. Timeout related settings: icp_query_timeout 50 maximum_icp_query_timeout 50 forward_timeout 4 minutes connect_timeout 1 minute peer_connect_timeout 30 seconds read_timeout 15 minutes request_timeout 5 minutes persistent_request_timeout 1 minute pconn_timeout 120 seconds Anyone has any clue? Thanks very much! - Ding Deng [1] Yes, we know that we should try v2.6 first and see if the problem still occurs, but it's difficult to do that in a production environment (you know that, right? ;-), and our boss is way harder to persuade than you may imagine ;-( "andre wang" <andre.ease@xxxxxxxxx> writes: > HI ALL: > > We are running Squid 2.5STABLE14 on Linux machines trying to run a > cluster of caches in a siblings peering arrangement using multicast > for ICP queries. The caches seem to be talking to each other fine. > > When the client sends a HTTP requested that isn't cached on the > configured cache, the cache sends out an ICP multicast query, all > other caches recieve this fine and respond. Either with UDP_MISS or > UDP_HIT. The problem is, if the other caches respond with a UDP_HIT > the orginal cache still fetches the object directly, rather than > fetching the object from the sibling. Why? > > And I have checked the access.log, got these: > > On the first cache (172.19.0.229) 1187773057.113 3 222.220.132.48 > TCP_MISS/200 315 GEThttp://XXXXX - DIRECT/XXXX > > On the sibling cache (172.19.0.228) 1187773057.002 0 172.19.0.229 > UDP_HIT/000 108 ICP_QUERYhttp://XXXXXX - NONE/- - > > Any idear? > Thanks