On 9/07/2013 7:37 a.m., Eliezer Croitoru wrote:
Sorry it took me so much time to reply but it seems like either a log
bug or something else.
I am unsure about it yet.
the line in the access.log is:
1373311134.462 38 192.168.10.124 TCP_MISS/304 332 GET
http://image.slidesharecdn.com/glusterorgwebinarant-120126131226-phpapp01/95/slide-13-728.jpg?1329866994
- HIER_DIRECT/88.221.156.163 image/jpeg
Which is kind of not real since it's a TCP_MISS but still a 304.
The log code would be TCP_CLIENT_IMS_MISS or something like that once we
have cleaned the log code types up, for now there is no such code and
Squid still uses TCP_MISS.
The client sent in IMS. There was no cached entry (MISS) and the server
was contacted using the clients IMS headers. The serve replied 304 which
Squid relayed to the client. There is no change to squid cache since; a)
there was no cached entry to begin with, and b) 304 does not provide a
payload object to add to storage.
If I do a calculation it's like I said before for somebody who asked here.
There is a TCP_MISS instead of TCP_IMS/304.
There are the ALL,2 cache.log output:
##start
HTTP Client REQUEST:
---------
GET
http://image.slidesharecdn.com/glusterorgwebinarant-120126131226-phpapp01/95/slide-13-728.jpg?1329866994
HTTP/1.1
Host: image.slidesharecdn.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:22.0) Gecko/20100101
Firefox/22.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
If-Modified-Since: Wed, 08 Aug 2012 08:30:54 GMT
If-None-Match: "ff40d19dc3187ec5643c7c4a2ffcd0a9"
Cache-Control: max-age=0
----------
HTTP Server REQUEST:
---------
GET
/glusterorgwebinarant-120126131226-phpapp01/95/slide-13-728.jpg?1329866994
HTTP/1.1
Host: image.slidesharecdn.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:22.0) Gecko/20100101
Firefox/22.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
If-Modified-Since: Wed, 08 Aug 2012 08:30:54 GMT
If-None-Match: "ff40d19dc3187ec5643c7c4a2ffcd0a9"
Cache-Control: max-age=0
Connection: keep-alive
----------
HTTP Server REPLY:
---------
HTTP/1.1 304 Not Modified
Content-Type: image/jpeg
Last-Modified: Wed, 08 Aug 2012 08:30:54 GMT
ETag: "ff40d19dc3187ec5643c7c4a2ffcd0a9"
Cache-Control: max-age=31536000
Date: Mon, 08 Jul 2013 19:19:18 GMT
Connection: keep-alive
Vary: Accept-Encoding
----------
HTTP Client REPLY:
---------
HTTP/1.1 304 Not Modified
Content-Type: image/jpeg
Last-Modified: Wed, 08 Aug 2012 08:30:54 GMT
ETag: "ff40d19dc3187ec5643c7c4a2ffcd0a9"
Cache-Control: max-age=31536000
Date: Mon, 08 Jul 2013 19:19:18 GMT
Vary: Accept-Encoding
X-Cache: MISS from www1.home
X-Cache-Lookup: MISS from www1.home:3128
Connection: keep-alive
----------
##end
Which shows that the request is IMS from the client with a ETAG on it.
The request is served from local cache rather then from squid cache
since the client didn't asked for a response.
With basic wget and curl I am getting a full TCP_HIT or TCP_IMS_HIT when
doing the same request over and over again.
But it seems like the browser is smarter then plain wget and obeys the
"rules" of cache and tried to verify the ETAG and Last-Modified.
Yes. The browser has a cache, wget and curl do not. That is all.
Now the refresh_pattern is:
refresh_pattern ^http://slidesharecdn\.squid\.internal/.* 10080 80%
79900 override-expire override-lastmod ignore-no-cache ignore-private
ignore-reload ignore-must-revalidate ignore-no-store store-stale
ignore-must-revalidate
There is nothing in the Squid cache and no payload body arriving on a
304. Whether or not it is tested refresh_pattern has nothing to work with.
Which I am unsure if squid suppose to actually ignore-no-cache or to
convert it to IMS.
NP: ignore-no-cache is obsolete and not supported by the latest 3.2+
versions of Squid. You should be getting a WARNING out of -k parse about
that.
It is irrelevant to your query though.
The next log is when I use firefox and do a full refresh which is using
SHIFT+F5 which is not the same as F5 that I have used and have been seen
on the logs before:
##start
HTTP Client REQUEST:
---------
GET
http://image.slidesharecdn.com/glusterorgwebinarant-120126131226-phpapp01/95/slide-13-728.jpg?1329866994
HTTP/1.1
Host: image.slidesharecdn.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:22.0) Gecko/20100101
Firefox/22.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Notice the absence of If-* headers this time. Shift+F5 is a "reload"
request. Meaning the browser cache entry is ignored and Cache-Control
sent upstream (to Squid) requires only the most up to date content to be
delivered.
Firefox uses "no-cache", some other agents use "max-age=0" which mean
the same thing. Any receiving cache MUST revalidate (IMS request) its
cached content before responding with a HIT.
However in your scenario there is no cached content stored by Squid. So
the client request is again simply relayed out to the server. This time
however the server responds with a 200 request. Which has a payload
object for Squid to cache.
----------
HTTP Client REPLY:
---------
HTTP/1.1 200 OK
x-amz-id-2: rzYdkUpXw11w51eJTKynLasRv3HUjy4d45Kj0a2jj7swoWlpps+p3orjUpgiywdd
x-amz-request-id: 959D56498B76D223
Last-Modified: Wed, 08 Aug 2012 08:30:54 GMT
x-amz-version-id: gducYyCJ.73GoIA.WcScH1tx7qj9rkby
ETag: "ff40d19dc3187ec5643c7c4a2ffcd0a9"
Accept-Ranges: bytes
Content-Type: image/jpeg
Server: AmazonS3
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 59728
Cache-Control: max-age=31536000
Date: Mon, 08 Jul 2013 19:26:47 GMT
X-Cache: MISS from www1.home
X-Cache-Lookup: MISS from www1.home:3128
Connection: keep-alive
----------
2013/07/08 22:26:23.258 kid1| store_io.cc(34) storeCreate: storeCreate:
Selected dir 0 for -1@-1=0/2/1/1
##end
I am not sure if to think of it as a bug or just as it suppose to be...
So we have two cases which are not the same.
Yes the second case presents a payload body Squid might store. Given the
ignores and overrides on your refresh_pattern it is very likely to be
stored regardless of any "dont store" signals from the server, which
results in a HIT on following traffic.
The first is suppose to be a 304 and is recognized as a 304 but still
considered to be a TCP_MISS which is basically a "HIT" and the second
No, both cases you showed above are "basically a MISS". The difference
is whether Squid received a cacheable object from the server or not. In
the second case it does - which allows Squid to get out of the "MISS"
state and start responding with future HITs.
case when the request is a no-cache with pragma which suppose to be
served from cache or at least to be treated with 304 towards the server
and still it stays as a full regular request and then as a TCP_MISS.
There is no cached object. If there was one ignore-must-revalidate would
have made Squid send plain old GET requests to the server instead of IMS
requests - *if* the IMS was generated by *Squid*.
If there is a need to file a bug just let me know and I will file it
with all the details.
If not please help me make sure I understood right the logic.
Hope the above helps explain.
On the whole the aim of Squid is to optimize HTTP bandwidth consumption.
As such we have controls converting reload fetches into IMS (reduces
bandwidth to the server), but not converting client IMS into a reload -
which would present a security vulnerability ("bandwidth amplification"
attack type) if clients could send 300 bytes to Squid and trigger an
unlimited amount of internal bandwidth consumption.
There are two things here which make your cases behavie like they do:
1) there is no cached content already. Not really a problem, but it does
mean you are dealing with MISS on both times instead of HIT.
2) the logging codes are not really as permiable as the log
documentation suggests. Specifically TCP_CLIENT_IMS_* codes do not exist
to clarify the TCP_*/304 case you are seeing.
Amos