Re: Request processing question

David Lawson <david@xxxxxxxx> · Sun, 6 Apr 2008 15:19:15 -0400

On Apr 6, 2008, at 4:59 AM, Henrik Nordstrom wrote:
lör 2008-04-05 klockan 23:26 -0400 skrev David Lawson:
I've got a couple questions about how Squid chooses to fulfill a
request.  Basically, I've got a cache with a number of sibling peers
defined.  Some of the time it makes an ICP query to those peers and
then does everything it should do, takes the first hit, makes the  
HTTP
request for the object via that peer, etc.  Some, perhaps most, of  
the
time, it doesn't even make an ICP query for the object, it just goes
direct to the origin server.

The primary distinction is hierarchical/nonhierarchical requests.
Siblings is only queried on hierarchical requests.

non-hierarchical:
 - reload requests
 - cache validations if you have non-Squid ICP peers
 - non-GET/HEAD/TRACE requests
 - authenticated requests
 - matching hierarchy_stoplist

Hmmm, okay, that was more or less the assumption I was working under,  
but the behavior I'm seeing doesn't seem to match that.  One of my  
coworkers did a packet capture of two requests, one of which resulted  
in an ICP query, the other of which bypassed the ICP query process  
entirely and went direct to the origin.

ICP:

   GET http://www.foo.com:8881/towns/baz/x1151547945 HTTP/1.0\r\n
       Request Method: GET
       Request URI: http://www.foo.com:8881/towns/baz/x1151547945
       Request Version: HTTP/1.0
   Host: www.foo.com:8881\r\n
   Accept: text/html,text/plain,application/*\r\n
   From: user@xxxxxxxxxx\r\n
   User-Agent: gsa-crawler (Enterprise; GIX-01642; user@xxxxxxxxxx)\r\n
   Accept-Encoding: gzip\r\n
   If-Modified-Since: Sun, 16 Mar 2008 22:22:39 GMT\r\n
   Via: 1.0 cache2.ghm.zope.net:80 (squid/2.5.STABLE12)\r\n
   X-Forwarded-For: 64.233.190.112\r\n
   Cache-Control: max-age=86400\r\n
   \r\n

Non-ICP:

Hypertext Transfer Protocol
   GET http://www.bar.com:8881/baz/news/rss HTTP/1.0\r\n
       Request Method: GET
       Request URI: http://www.bar.com:8881/baz/news/rss
       Request Version: HTTP/1.0
   Host: www.wickedlocal.com:8881\r\n
   User-Agent: Yahoo-Newscrawler/3.9 (news-search-crawler at yahoo- 
inc dot com)\r\n
   Via: 1.0 cache4.ghm.zope.net:80 (squid/2.5.STABLE12)\r\n
   X-Forwarded-For: 69.147.86.154\r\n
   Cache-Control: max-age=86400\r\n
   \r\n

Any ideas about why those requests were processed differently?

I've also got a broader, more general question of how a request flows
through the Squid process, when ACLs are processed, are they before  
or
after any rewriter is done to the URLs, etc., but that's a really
secondary thing, right now I'm just concerned with the ICP question.

Depends on which access directive you look at. Generally speaking
http_access is before url rewrites, the rest after.

Ah, okay.  Thanks Henrik, I appreciate the info.

--Dave
Systems Administrator
Zope Corp.
540-361-1722
david@xxxxxxxx