Search squid archive

Re: Forcing TCP_REFRESH_HIT to be answered from cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



dererk@xxxxxxxxxxxxxxxxxxxxxxxxx wrote:
Hi everyone!

I'm running a reverse proxy (1) to help my httpd to serve content fast
and
avoid going to the origin as much as possible.
Doing that, I found I made a _lot_ of TCP_REFRESH_HIT requests to

First off lets get the terminology clear:

REFRESH_HIT means the cached copy was sent to the client. But the origin was tested to make sure it was correct.

REFRESH_MISS means the above was tried but during the IMS check the web server pushed a new copy out to be sent the client.


Things to check which can cause this to happen a lot:

* The web server sending Cache-Control: must-revalidate. Check for and remove it where possible. It forces every single request to be a REFRESH_* instead of a nice cache HIT.

* Invalid date formats in the HTTP reply headers. They break the staleness checks and cause their headers to be ignored. Some (ie Expires) are required to be interpreted as instant staleness.

* Client Cache-Control headers. There is little that can be done to avoid these. The refresh_pattern reload-into-ims option is about all I think.


Having skipped ahead and read your problem. This is what I think you need to start with:

* Send Expires: header just under a year in advance (ie 364 days 23 hours). Then make sure your proxy caches obey the Expires: header (ie remove all copies of overide-expires).

The rest depends on which version of Squid you have:

With Squid-2.7 you have stale-while-revalidate and stale-if-error options. These can be sent by the web server to permit your proxy giving a fast but potentially stale response to the clients. An IMS refresh will still happen, but will be done in the background without affecting any of the clients response times.

Squid-3 brings the Surrogate-Control feature for the web server to send a completely different set of Cache-Control options to your reverse-proxy. But the stale-* features are not yet ported. I think they will be of more use meeting your stated requirements.

You can fine-tune further with the ignore-* settings to ignore the headers sent by clients which reduce the HIT ratio. I'm not quite savvy enough to point at the full set. Starting with reload-into-ims is the first one. I'd recommend looking at the reply headers sent by the web server and removing any expiry related settings which will cause problems. Then looking at the client request headers coming in to Squid and see what can be done on that front.

origin, although I've an insane 10-year-long expiration date set on my
http response headers back to squid.

That might be part of the problem. RFC 2616 defines a limit of 1 year offset for valid expiry dates. Your date may be discarded from consideration due to its insanity.


Although I did verify that, using wget -S and some fancies tcpdump
lines, I wanted to get rid of any TCP_REFRESH_HIT request, main reason is
because there's no way some objects change, so requesting for freshness
has no sense moreover increases server load (1/7 are refresh_hit's).

These types of objects is what Expires: exists for. see above.

If you continue to use the freshness algorithms then make darn sure the objects never get their on-disk timestamps touched. The IMS algorithm percentage extends the period between freshness checks in a exponential scale from time of Last-Modified header with powers of the pct% setting.

The values only matter when freshness needs to be estimated.


I used refresh_pattern with override-expire and extremely high values
for min and max values, with absolutely no effect.

Note that the config values are in minutes and for use are converted to seconds. There is a point where insanely high values are accepted as valid signed minutes but wrap and become negative when multiplied into seconds. This happens from about 7* *** *** and is not checked by Squid beyond negatives being rounded up to zero.

The values only matter when freshness needs to be estimated.


For the record, If I use offline_mode I obtain partially what I wanted,
unfortunately I loose the flexibility of the regex capacity that
refresh_pattern has, which I need for avoiding special objects.

offline_mode is badly named. It means aggressive caching.

There has been a lot of work done in making that type of caching normal. The very latest 2.7 and 3.1 releases go a long way towards it, but for even more caching capability you want the development code in 2.HEAD or 3.HEAD.


I've enabled debug for a blink of an eye, and got a request that goes as
TCP_REFRESH_HIT, and as for what I understand, appears to be answered as
being stale and requested back to origin.

2010/07/14 13:35:58| parseHttpRequest: Complete request received
2010/07/14 13:35:58| removing 1462 bytes; conn->in.offset = 0
2010/07/14 13:35:58| clientSetKeepaliveFlag: http_ver = 1.0
2010/07/14 13:35:58| clientSetKeepaliveFlag: method = GET
2010/07/14 13:35:58| clientRedirectStart: 'http://foobar.com/object'
2010/07/14 13:35:58| clientRedirectDone: 'http://foobar.com/object'
result=NULL
2010/07/14 13:35:58| clientInterpretRequestHeaders: REQ_NOCACHE = NOT
SET
2010/07/14 13:35:58| clientInterpretRequestHeaders: REQ_CACHABLE = SET 2010/07/14 13:35:58| clientInterpretRequestHeaders: REQ_HIERARCHICAL =
SET
2010/07/14 13:35:58| clientProcessRequest: GET
'http://foobar.com/object'
2010/07/14 13:35:58| clientProcessRequest2: storeGet() MISS
2010/07/14 13:35:58| clientProcessRequest: TCP_MISS for
'http://foobar.com/object'
2010/07/14 13:35:58| clientProcessMiss: 'GET http://foobar.com/object'
2010/07/14 13:35:58| clientCacheHit: http://foobar.com/object = 200
2010/07/14 13:35:58| clientCacheHit: refreshCheckHTTPStale returned 1
2010/07/14 13:35:58| clientCacheHit: in refreshCheck() block
2010/07/14 13:35:58| clientProcessExpired: 'http://foobar.com/object' 2010/07/14 13:35:58| clientProcessExpired: lastmod -1
2010/07/14 13:35:58| clientReadRequest: FD 84: reading request...
2010/07/14 13:35:58| parseHttpRequest: Method is 'GET'
2010/07/14 13:35:58| parseHttpRequest: URI is '/object'

In the way of checking anything to get some effect, I also gived a try
to ignore-stale-while-revalidate override-lastmod override-expire
ignore-reload ignore-no-cache, pushed refresh_stale_hit high in the sky,
and again, no effects :-(

stale-while-revalidate is an HTTP control which web server send to permits your proxy to do the refresh part in the background without doing that "block" and slowing down the users response even if they get a slightly stale object for a short while.

Setting ignore-stale-while-revalidate does the opposite of what you say you want.

FYI: stale-if-error is its twin and keeps the proxy serving data from cache if the web server dies completely or starts sending back fatal 5xx replies. Both good things to have on a reverse proxy.


What I'm doing wrong? Is there any other way to avoid REFRESH_HITs from
being performed?

Only the extreme: don't use a proxy. That way all requests are direct client requests and there is no cache to be updated with new info.

Correct use and handling of Expires: is designed for your type of very-long-aged objects.

stale-while-revalidate is designed for shorter more dynamic objects which also need to be served without a blocking lag.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.5


[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux