Re: Forcing TCP_REFRESH_HIT to be answered from cache

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Thu, 15 Jul 2010 20:36:45 +1200

dererk@xxxxxxxxxxxxxxxxxxxxxxxxx wrote:
Hi everyone!

I'm running a reverse proxy (1) to help my httpd to serve content fast
and
avoid going to the origin as much as possible.
Doing that, I found I made a _lot_ of TCP_REFRESH_HIT requests to

First off lets get the terminology clear:

  REFRESH_HIT means the cached copy was sent to the client. But the 
origin was tested to make sure it was correct.

  REFRESH_MISS means the above was tried but during the IMS check the 
web server pushed a new copy out to be sent the client.

Things to check which can cause this to happen a lot:

 * The web server sending Cache-Control: must-revalidate. Check for and 
remove it where possible. It forces every single request to be a 
REFRESH_* instead of a nice cache HIT.

 * Invalid date formats in the HTTP reply headers. They break the 
staleness checks and cause their headers to be ignored. Some (ie 
Expires) are required to be interpreted as instant staleness.

 * Client Cache-Control headers. There is little that can be done to 
avoid these. The refresh_pattern reload-into-ims option is about all I 
think.

Having skipped ahead and read your problem. This is what I think you 
need to start with:

 * Send Expires: header just under a year in advance (ie 364 days 23 
hours). Then make sure your proxy caches obey the Expires: header (ie 
remove all copies of overide-expires).

The rest depends on which version of Squid you have:

 With Squid-2.7 you have stale-while-revalidate and stale-if-error 
options. These can be sent by the web server to permit your proxy giving 
a fast but potentially stale response to the clients. An IMS refresh 
will still happen, but will be done in the background without affecting 
any of the clients response times.

 Squid-3 brings the Surrogate-Control feature for the web server to 
send a completely different set of Cache-Control options to your 
reverse-proxy. But the stale-* features are not yet ported. I think they 
will be of more use meeting your stated requirements.

 You can fine-tune further with the ignore-* settings to ignore the 
headers sent by clients which reduce the HIT ratio. I'm not quite savvy 
enough to point at the full set.
 Starting with reload-into-ims is the first one. I'd recommend looking 
at the reply headers sent by the web server and removing any expiry 
related settings which will cause problems. Then looking at the client 
request headers coming in to Squid and see what can be done on that front.

origin, although I've an insane 10-year-long expiration date set on my
http response headers back to squid.

That might be part of the problem. RFC 2616 defines a limit of 1 year 
offset for valid expiry dates. Your date may be discarded from 
consideration due to its insanity.

Although I did verify that, using wget -S and some fancies tcpdump
lines, 
I wanted to get rid of any TCP_REFRESH_HIT request, main reason is
because there's no way some objects change, so requesting for freshness
has no sense moreover increases server load (1/7 are refresh_hit's).

These types of objects is what Expires: exists for. see above.

If you continue to use the freshness algorithms then make darn sure the 
objects never get their on-disk timestamps touched. The IMS algorithm 
percentage extends the period between freshness checks in a exponential 
scale from time of Last-Modified header with powers of the pct% setting.

The values only matter when freshness needs to be estimated.

I used refresh_pattern with override-expire and extremely high values
for min and max values, with absolutely no effect.

Note that the config values are in minutes and for use are converted to 
seconds. There is a point where insanely high values are accepted as 
valid signed minutes but wrap and become negative when multiplied into 
seconds. This happens from about 7* *** *** and is not checked by Squid 
beyond negatives being rounded up to zero.

The values only matter when freshness needs to be estimated.

For the record, If I use offline_mode I obtain partially what I wanted,
unfortunately I loose the flexibility of the regex capacity that
refresh_pattern has, which I need for avoiding special objects.

offline_mode is badly named. It means aggressive caching.

There has been a lot of work done in making that type of caching normal. 
The very latest 2.7 and 3.1 releases go a long way towards it, but for 
even more caching capability you want the development code in 2.HEAD or 
3.HEAD.

I've enabled debug for a blink of an eye, and got a request that goes as
TCP_REFRESH_HIT, and as for what I understand, appears to be answered as
being stale and requested back to origin.

2010/07/14 13:35:58| parseHttpRequest: Complete request received
2010/07/14 13:35:58| removing 1462 bytes; conn->in.offset = 0
2010/07/14 13:35:58| clientSetKeepaliveFlag: http_ver = 1.0
2010/07/14 13:35:58| clientSetKeepaliveFlag: method = GET
2010/07/14 13:35:58| clientRedirectStart: 'http://foobar.com/object'
2010/07/14 13:35:58| clientRedirectDone: 'http://foobar.com/object'
result=NULL
2010/07/14 13:35:58| clientInterpretRequestHeaders: REQ_NOCACHE = NOT
SET
2010/07/14 13:35:58| clientInterpretRequestHeaders: REQ_CACHABLE = SET 
2010/07/14 13:35:58| clientInterpretRequestHeaders: REQ_HIERARCHICAL =
SET
2010/07/14 13:35:58| clientProcessRequest: GET
'http://foobar.com/object'
2010/07/14 13:35:58| clientProcessRequest2: storeGet() MISS
2010/07/14 13:35:58| clientProcessRequest: TCP_MISS for
'http://foobar.com/object'
2010/07/14 13:35:58| clientProcessMiss: 'GET http://foobar.com/object'
2010/07/14 13:35:58| clientCacheHit: http://foobar.com/object = 200
2010/07/14 13:35:58| clientCacheHit: refreshCheckHTTPStale returned 1
2010/07/14 13:35:58| clientCacheHit: in refreshCheck() block
2010/07/14 13:35:58| clientProcessExpired: 'http://foobar.com/object' 
2010/07/14 13:35:58| clientProcessExpired: lastmod -1
2010/07/14 13:35:58| clientReadRequest: FD 84: reading request...
2010/07/14 13:35:58| parseHttpRequest: Method is 'GET'
2010/07/14 13:35:58| parseHttpRequest: URI is '/object'

In the way of checking anything to get some effect, I also gived a try
to ignore-stale-while-revalidate override-lastmod override-expire
ignore-reload ignore-no-cache, pushed refresh_stale_hit high in the sky,
and again, no effects :-(

stale-while-revalidate is an HTTP control which web server send to 
permits your proxy to do the refresh part in the background without 
doing that "block" and slowing down the users response even if they get 
a slightly stale object for a short while.

 Setting ignore-stale-while-revalidate does the opposite of what you 
say you want.

FYI: stale-if-error is its twin and keeps the proxy serving data from 
cache if the web server dies completely or starts sending back fatal 5xx 
replies. Both good things to have on a reverse proxy.

What I'm doing wrong? Is there any other way to avoid REFRESH_HITs from
being performed?

Only the extreme: don't use a proxy. That way all requests are direct 
client requests and there is no cache to be updated with new info.

Correct use and handling of Expires: is designed for your type of 
very-long-aged objects.

stale-while-revalidate is designed for shorter more dynamic objects 
which also need to be served without a blocking lag.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.5