OK, found the problem. All the "problematic" objects are from multi-IP domains and sometimes the browser resolves them and sends the request to an IP that is not in the list (this is for intercept mode). So, in the browser with http_watch I see that the request for http://www.googleadservices.com/pagead/conversion_async.js is sent to 173.194.118.122, but in nslookup with set debug option I see: Name: pagead.l.doubleclick.net Addresses: 173.194.118.45 173.194.118.58 173.194.118.57 Aliases: www.googleadservices.com The IP resolved by the browser is not in the list! So, squid interprets this as a destination IP forgery and doesn't cache the response. This behavior is documented at host_verify_strict option. By default it's set to off, that's why it's difficult to discover the reason. If you set it to on and try to download a problematic object, squid will return URI Host Conflict (409 Conflict) and in the access.log you'll see TAG_NONE/409 (additionally, with increased debug levels, you'll also see security alerts). This should partly explain the numerous complaints about more-than-expected misses. This is actually a problem, as the IP mismatches are not due to an artificially crafted request, but a normal functioning of the DNS and different levels of its caching. The reason for IP mismatch should be the frequency of DNS updates for these multi-IP domains. Actually, you can see with nslookup in debug mode that www.googleadservices.com has the default TTL of just 5 min, cdn.clicktale.net of 2 min, google.com of 1 min 25 sec and global.ssl.fastly.net of 25 sec. When I restart DNS Client service, I get a HIT from squid for almost all of the originally published problematic objects without any security alerts, until the IP discrepancies start to appear again. So, it looks like the destination IP forgery check should be relaxed somehow (for example, with /24 mask as the majority of the mismatches in the IPs are in the last octet) or squid should cache for a long time all the IPs for all the domains, just for this forgery check. Another (at least as a temporary workaround) option would be to disable this check completely as it actually poses very little risk for a correctly configured squid with trusted clients. At the same time, an untrusted client could request a virus for some known file via his own host and make squid this way cache and distribute an infected file to the rest of the clients. The best option, I think, would be for the requests considered as forgery to overwrite the destination IP provided by the client with one of the resolved IPs for the domain in the Host field (like with client_dst_passthru off). And here is a patch for this. Please note I haven't done extensive security issues verifications, so use it with caution. [Note for the developers: please replace "if (true)" with a new option (could be something like host_verify_strict_use_host_on_failure, on by default) and check that the local and local.port assignments are implemented according to the general style.] --- src/client_side_request.cc 2014-03-09 06:40:56.000000000 -0300 +++ src/client_side_request.cc 2014-04-21 02:53:11.277155130 -0300 @@ -545,6 +545,16 @@ } debugs(85, 3, HERE << "validate IP " << clientConn->local << " non-match from Host: IP " << ia->in_addrs[i]); } + + if (true) { + unsigned short port = clientConn->local.port(); + debugs(85, 3, HERE << "[anti-forgery] Host-non-matched remote IP (" << clientConn->local << ") was replaced with the first Host resolved IP (" << ia->in_addrs[0] << ":" << clientConn->local.port() << ")"); + clientConn->local = ia->in_addrs[0]; + clientConn->local.port(port); + http->request->flags.hostVerified = true; + http->doCallouts(); + return; + } } debugs(85, 3, HERE << "FAIL: validate IP " << clientConn->local << " possible from Host:"); hostHeaderVerifyFailed("local IP", "any domain IP"); (I attach it with this email to avoid mailinglist formatting changes. To apply the patch place it in the squid-3.4.4 source directory, remove .txt extension and run: patch -p0 --verbose < squid_forgery.patch) After applying this patch the hit rate increased significantly for all types of objects, not only for those that match refresh_pattern options. No more random misses, than hits, then misses again. Still, the adobe .exe file was not caching. So I decided to continue the investigations and finally found what the problem was. With adequate debug_options enabled, squid was saying that the object size was too big (I've added the CL (Content-Length), SMOS (store_maxobjsize) and EO (endOffset) variables to the log line). 2014/04/21 00:35:35.429| store.cc(1020) checkCachable: StoreEntry::checkCachable: NO: too big (CL = 33560984; SMOS = 4194304; EO = 268) Clearly, something was wrong with the maxobjsize, that was set in the config to 1Gb and the log was reporting it being set to 4Mb (what I discovered later to be the default value). After some additional research, I found that in the src/cf_parser.cci file (generated by make) there are 2 calls to the configuration initialization functions for almost all the configuration options - the first one is for the predefined (default) values and the second one for the config file values. There is a function parse_cachedir (defined in src/cache_cf.cc) that initializes the store data structure with the options related to the store (like maxobjsize), and it is called when the config parser finds cache_dir option in the config and it's not called again when it finds all other cache related options. So, if you put in your config something like this (like it was in mine): cache_dir aufs /var/cache 140000 16 256 maximum_object_size 1 GB then the maximum_object_size option is processed and you see it at the cachemgr config page but it has no effect as the store data structure parameter maxobjsize was already initialized (with the default value) by parse_cachedir before parsing the "maximum_object_size 1 GB" line, so we have 4Mb (default) effective maximum_object_size. If we have a config with maximum_object_size 1 GB cache_dir aufs /var/cache 140000 16 256 we get the effective maximum_object_size for the store set to 1Gb as expected. There are warnings in the documentation that the order of config options is important, but it is only explained in the context of ACLs and other unrelated settings. In my opinion, this is a huge problem as it is nothing obvious what should precede what. There should be at least a note in the documentation for each option affected by the order of config processing and there should be a final "all effective values" output at squid initialization (maybe with -d 2 and higher) and of course cachemgr config page should show correct (effective) values. Now it is: maximum_object_size @ cachemgr config page: 2147483648 bytes Effective maximum_object_size: 4194304 bytes And a better solution would be to call parse_cachedir (and similar functions) at the end of the config file processing (an extremely simple fix in the src/cf_parser.cci generation). Now, with the patch and the "correct" order of maximum_object_size and cache_dir (put cache_dir after all the cache-related options, including memory cache ones), all "problematic" objects are cached as expected and there is a huge (like 10-fold on average and more than 100-fold for WU and similar) increase in the hit rate. Rock-solid caching! Regards, Anatoli -----Original Message----- From: Anatoli [mailto:me@xxxxxxxxxx] Sent: Sunday, April 20, 2014 14:22 To: squid-users@xxxxxxxxxxxxxxx Subject: RE: Strange misses of cacheable objects Amos, Antony, Thanks for your replies. Actually, the refresh_pattern rules that apply to all the cases indicate to overwrite the Expires header field and force the object to the cache (they all have override-expire ignore-private ignore-no-cache reload-into-ims options), so Date == Expires should not be a problem or refresh_pattern options are implemented incorrectly. With respect to Vary and ETag fields, there are lots of other objects that have similar values and they cache perfectly. Also, please note that the requests are made for the same object from the same browser on the same machine with some seconds to minutes of difference in between, so the headers are absolutely identical in each request and the objects, if still fresh, should be cached. And some hours later I started to get hits for some of these object and some hours later misses again (and I mean hits and misses on various requests in a row, say 5 requests for the same object with an interval of 10 seconds all return hits, some hours later the same return all misses, so it's not that the objects expired). Try requesting these objects multiple times at different moments of a day and you probably will see the same behavior. It looks like there is a problem with the current implementation; I don't see any solid reason for these objects not to cache. Regards, Anatoli P.S. My email was blocked (blacklisted) by my hosting provider for 4 working days after sending a single mail to this list. They said it was marked as spam more than once. Is it something to expect? Is it possible to post to this list without sending an email? Any web interface? -----Original Message----- From: Amos Jeffries [mailto:squid3@xxxxxxxxxxxxx] Sent: Monday, April 14, 2014 12:19 To: squid-users@xxxxxxxxxxxxxxx Subject: Re: Strange misses of cacheable objects On 15/04/2014 1:07 a.m., Anatoli wrote: > Dear all, > > I have some strange misses of cacheable objects. Here is an example (there > are a lot of these objects): > > http://www.googletagmanager.com/gtm.js?id=GTM-J5TM > As Michael said this one is expired immediately on arrival. Note: in HTTP/1.1 expired responses ARE cacheable without any special actions. It is also private data for the individual client. Squid is now enforcing revalidation of private responses which make their way into the cache. That ensures the new requestor is a valid recipient of any cached response. > Some other links with very similar request/response headers and squid log > entries are > http://www.googleadservices.com/pagead/conversion_async.js Same for this one in regards to immediate expiry. However the ETag header contains invalid values so HTTP/1.1 revalidation does not work. > http://cdn.clicktale.net/www07/ptc/3a4e2875-764c-4dd7-8d30-db51f1744e3e.js > http://cdn.clicktale.net/www/tc/WRe10.js These URL are emitting a mix of different Vary: header contents. Which make caching highly volatile and likely to produce incorrect responses to clients. The ETag header used for unique object identification on the second one is also containing invalid values (this time duplicates) in some responses. Both of these server errors will cause erratic behaviour in the caching of the objects as they undermine the reliability of the response. > > http://armdl.adobe.com/pub/adobe/reader/win/9.x/9.5.0/en_US/AdbeRdr950_en_US > .exe has the same behavior and was already reported by Marko Cupać @ May 15, > 2013; 7:17am in "adobe updates constantly re-downloading" mail. Nobody > offered any solution. Since Marko's original post I have seen simialr behaviour on a different clients network when the user account logged into the machine did not have permission to install software yet Adobe updater was running anyway and failing at the beginning install stage. Amos
--- src/client_side_request.cc 2014-03-09 06:40:56.000000000 -0300 +++ src/client_side_request.cc 2014-04-21 02:53:11.277155130 -0300 @@ -545,6 +545,16 @@ } debugs(85, 3, HERE << "validate IP " << clientConn->local << " non-match from Host: IP " << ia->in_addrs[i]); } + + if (true) { + unsigned short port = clientConn->local.port(); + debugs(85, 3, HERE << "[anti-forgery] Host-non-matched remote IP (" << clientConn->local << ") was replaced with the first Host resolved IP (" << ia->in_addrs[0] << ":" << clientConn->local.port() << ")"); + clientConn->local = ia->in_addrs[0]; + clientConn->local.port(port); + http->request->flags.hostVerified = true; + http->doCallouts(); + return; + } } debugs(85, 3, HERE << "FAIL: validate IP " << clientConn->local << " possible from Host:"); hostHeaderVerifyFailed("local IP", "any domain IP");