RE: Strange misses of cacheable objects [SOLVED]

"Anatoli" <me@xxxxxxxxxx> · Mon, 21 Apr 2014 20:04:34 -0300

OK, found the problem. All the "problematic" objects are from multi-IP
domains and sometimes the browser resolves them and sends the request to an
IP that is not in the list (this is for intercept mode).

So, in the browser with http_watch I see that the request for
http://www.googleadservices.com/pagead/conversion_async.js is sent to
173.194.118.122, but in nslookup with set debug option I see:

Name:    pagead.l.doubleclick.net
Addresses:  173.194.118.45
          173.194.118.58
          173.194.118.57
Aliases:  www.googleadservices.com

The IP resolved by the browser is not in the list!

So, squid interprets this as a destination IP forgery and doesn't cache the
response. This behavior is documented at host_verify_strict option. By
default it's set to off, that's why it's difficult to discover the reason.
If you set it to on and try to download a problematic object, squid will
return URI Host Conflict (409 Conflict) and in the access.log you'll see
TAG_NONE/409 (additionally, with increased debug levels, you'll also see
security alerts).

This should partly explain the numerous complaints about more-than-expected
misses.

This is actually a problem, as the IP mismatches are not due to an
artificially crafted request, but a normal functioning of the DNS and
different levels of its caching. The reason for IP mismatch should be the
frequency of DNS updates for these multi-IP domains.  Actually, you can see
with nslookup in debug mode that www.googleadservices.com has the default
TTL of just 5 min, cdn.clicktale.net of 2 min, google.com of 1 min 25 sec
and global.ssl.fastly.net of 25 sec. When I restart DNS Client service, I
get a HIT from squid for almost all of the originally published problematic
objects without any security alerts, until the IP discrepancies start to
appear again.

So, it looks like the destination IP forgery check should be relaxed somehow
(for example, with /24 mask as the majority of the mismatches in the IPs are
in the last octet) or squid should cache for a long time all the IPs for all
the domains, just for this forgery check.

Another (at least as a temporary workaround) option would be to disable this
check completely as it actually poses very little risk for a correctly
configured squid with trusted clients. At the same time, an untrusted client
could request a virus for some known file via his own host and make squid
this way cache and distribute an infected file to the rest of the clients.

The best option, I think, would be for the requests considered as forgery to
overwrite the destination IP provided by the client with one of the resolved
IPs for the domain in the Host field (like with client_dst_passthru off).

And here is a patch for this. Please note I haven't done extensive security
issues verifications, so use it with caution. [Note for the developers:
please replace "if (true)" with a new option (could be something like
host_verify_strict_use_host_on_failure, on by default) and check that the
local and local.port assignments are implemented according to the general
style.]

--- src/client_side_request.cc  2014-03-09 06:40:56.000000000 -0300
+++ src/client_side_request.cc  2014-04-21 02:53:11.277155130 -0300
@@ -545,6 +545,16 @@
             }
             debugs(85, 3, HERE << "validate IP " << clientConn->local << "
non-match from Host: IP " << ia->in_addrs[i]);
         }
+ 
+        if (true) {
+            unsigned short port = clientConn->local.port();
+            debugs(85, 3, HERE << "[anti-forgery] Host-non-matched remote
IP (" << clientConn->local << ") was replaced with the first Host resolved
IP (" << ia->in_addrs[0] << ":" << clientConn->local.port() << ")");
+            clientConn->local = ia->in_addrs[0];
+            clientConn->local.port(port);
+            http->request->flags.hostVerified = true;
+            http->doCallouts();
+            return;
+        }
     }
     debugs(85, 3, HERE << "FAIL: validate IP " << clientConn->local << "
possible from Host:");
     hostHeaderVerifyFailed("local IP", "any domain IP");

(I attach it with this email to avoid mailinglist formatting changes. To
apply the patch place it in the squid-3.4.4 source directory, remove .txt
extension and run: patch -p0 --verbose < squid_forgery.patch)

After applying this patch the hit rate increased significantly for all types
of objects, not only for those that match refresh_pattern options. No more
random misses, than hits, then misses again.



Still, the adobe .exe file was not caching. So I decided to continue the
investigations and finally found what the problem was.

With adequate debug_options enabled, squid was saying that the object size
was too big (I've added the CL (Content-Length), SMOS (store_maxobjsize) and
EO (endOffset) variables to the log line).

2014/04/21 00:35:35.429| store.cc(1020) checkCachable:
StoreEntry::checkCachable: NO: too big (CL = 33560984; SMOS = 4194304; EO =
268)

Clearly, something was wrong with the maxobjsize, that was set in the config
to 1Gb and the log was reporting it being set to 4Mb (what I discovered
later to be the default value).

After some additional research, I found that in the src/cf_parser.cci file
(generated by make) there are 2 calls to the configuration initialization
functions for almost all the configuration options - the first one is for
the predefined (default) values and the second one for the config file
values. There is a function parse_cachedir (defined in src/cache_cf.cc) that
initializes the store data structure with the options related to the store
(like maxobjsize), and it is called when the config parser finds cache_dir
option in the config and it's not called again when it finds all other cache
related options. So, if you put in your config something like this (like it
was in mine):

cache_dir aufs /var/cache 140000 16 256
maximum_object_size 1 GB

then the maximum_object_size option is processed and you see it at the
cachemgr config page but it has no effect as the store data structure
parameter maxobjsize was already initialized (with the default value) by
parse_cachedir before parsing the "maximum_object_size 1 GB" line, so we
have 4Mb (default) effective maximum_object_size.

If we have a config with

maximum_object_size 1 GB
cache_dir aufs /var/cache 140000 16 256

we get the effective maximum_object_size for the store set to 1Gb as
expected.


There are warnings in the documentation that the order of config options is
important, but it is only explained in the context of ACLs and other
unrelated settings. In my opinion, this is a huge problem as it is nothing
obvious what should precede what. There should be at least a note in the
documentation for each option affected by the order of config processing and
there should be a final "all effective values" output at squid
initialization (maybe with -d 2 and higher) and of course cachemgr config
page should show correct (effective) values.

Now it is:
maximum_object_size @ cachemgr config page: 2147483648 bytes
Effective maximum_object_size: 4194304 bytes

And a better solution would be to call parse_cachedir (and similar
functions) at the end of the config file processing (an extremely simple fix
in the src/cf_parser.cci generation).


Now, with the patch and the "correct" order of maximum_object_size and
cache_dir (put cache_dir after all the cache-related options, including
memory cache ones), all "problematic" objects are cached as expected and
there is a huge (like 10-fold on average and more than 100-fold for WU and
similar) increase in the hit rate. Rock-solid caching!

Regards,
Anatoli


-----Original Message-----
From: Anatoli [mailto:me@xxxxxxxxxx] 
Sent: Sunday, April 20, 2014 14:22
To: squid-users@xxxxxxxxxxxxxxx
Subject: RE:  Strange misses of cacheable objects

Amos, Antony,

Thanks for your replies. Actually, the refresh_pattern rules that apply to
all the cases indicate to overwrite the Expires header field and force the
object to the cache (they all have override-expire ignore-private
ignore-no-cache reload-into-ims options), so Date == Expires should not be a
problem or refresh_pattern options are implemented incorrectly.

With respect to Vary and ETag fields, there are lots of other objects that
have similar values and they cache perfectly. Also, please note that the
requests are made for the same object from the same browser on the same
machine with some seconds to minutes of difference in between, so the
headers are absolutely identical in each request and the objects, if still
fresh, should be cached.

And some hours later I started to get hits for some of these object and some
hours later misses again (and I mean hits and misses on various requests in
a row, say 5 requests for the same object with an interval of 10 seconds all
return hits, some hours later the same return all misses, so it's not that
the objects expired).

Try requesting these objects multiple times at different moments of a day
and you probably will see the same behavior.

It looks like there is a problem with the current implementation; I don't
see any solid reason for these objects not to cache.

Regards,
Anatoli

P.S. My email was blocked (blacklisted) by my hosting provider for 4 working
days after sending a single mail to this list. They said it was marked as
spam more than once. Is it something to expect? Is it possible to post to
this list without sending an email? Any web interface?

-----Original Message-----
From: Amos Jeffries [mailto:squid3@xxxxxxxxxxxxx] 
Sent: Monday, April 14, 2014 12:19
To: squid-users@xxxxxxxxxxxxxxx
Subject: Re:  Strange misses of cacheable objects

On 15/04/2014 1:07 a.m., Anatoli wrote:
> Dear all,
> 
> I have some strange misses of cacheable objects. Here is an example (there
> are a lot of these objects):
> 
> http://www.googletagmanager.com/gtm.js?id=GTM-J5TM
> 

As Michael said this one is expired immediately on arrival.
 Note: in HTTP/1.1 expired responses ARE cacheable without any special
actions.

It is also private data for the individual client. Squid is now
enforcing revalidation of private responses which make their way into
the cache. That ensures the new requestor is a valid recipient of any
cached response.


> Some other links with very similar request/response headers and squid log
> entries are
> http://www.googleadservices.com/pagead/conversion_async.js

Same for this one in regards to immediate expiry. However the ETag
header contains invalid values so HTTP/1.1 revalidation does not work.


> http://cdn.clicktale.net/www07/ptc/3a4e2875-764c-4dd7-8d30-db51f1744e3e.js
> http://cdn.clicktale.net/www/tc/WRe10.js

These URL are emitting a mix of different Vary: header contents. Which
make caching highly volatile and likely to produce incorrect responses
to clients.

 The ETag header used for unique object identification on the second one
is also containing invalid values (this time duplicates) in some responses.

Both of these server errors will cause erratic behaviour in the caching
of the objects as they undermine the reliability of the response.


> 
>
http://armdl.adobe.com/pub/adobe/reader/win/9.x/9.5.0/en_US/AdbeRdr950_en_US
> .exe has the same behavior and was already reported by Marko Cupać @ May
15,
> 2013; 7:17am in "adobe updates constantly re-downloading" mail. Nobody
> offered any solution.

Since Marko's original post I have seen simialr behaviour on a different
clients network when the user account logged into the machine did not
have permission to install software yet Adobe updater was running anyway
and failing at the beginning install stage.


Amos
--- src/client_side_request.cc  2014-03-09 06:40:56.000000000 -0300
+++ src/client_side_request.cc  2014-04-21 02:53:11.277155130 -0300
@@ -545,6 +545,16 @@
             }
             debugs(85, 3, HERE << "validate IP " << clientConn->local << " non-match from Host: IP " << ia->in_addrs[i]);
         }
+ 
+        if (true) {
+            unsigned short port = clientConn->local.port();
+            debugs(85, 3, HERE << "[anti-forgery] Host-non-matched remote IP (" << clientConn->local << ") was replaced with the first Host resolved IP (" << ia->in_addrs[0] << ":" << clientConn->local.port() << ")");
+            clientConn->local = ia->in_addrs[0];
+            clientConn->local.port(port);
+            http->request->flags.hostVerified = true;
+            http->doCallouts();
+            return;
+        }
     }
     debugs(85, 3, HERE << "FAIL: validate IP " << clientConn->local << " possible from Host:");
     hostHeaderVerifyFailed("local IP", "any domain IP");