Re: HTTPS cache for Java application - only getting TCP_MISS

Tomas Finnøy <baretomas@xxxxxxxxxxxxxx> · Thu, 14 Jun 2018 11:49:59 -0400

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On June 14, 2018 1:25 PM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:

> On 14/06/18 07:28, baretomas wrote:
> 
> > Hello,
> > 
> > I'm setting up a Squid proxy as a cache for a number (as many as possible)
> > 
> > of identical JAVA applications to run their web calls through. The calls are
> > 
> > ofc identical, and the response they get can safely be cached for 5-10
> > 
> > seconds.
> > 
> > I do this because most of the calls is directed at a single server on the
> > 
> > internet that I don't want to hammer, since I will ofc be locked out of it
> > 
> > then.
> > 
> > Currently Im simply testing this on a single computer: the application and
> > 
> > squid
> > 
> > The calls from the application is done using ssl / https by telling java to
> > 
> > use Squid as a proxy (-Dhttps.proxyHost and -Dhttp.proxyHost). I've set up
> > 
> > squid and JAVA with self-signed certificates, and the application sends its
> > 
> > calls through squid and gets the reponse. No problem there (wasnt easy that
> > 
> > either I must say :P ).
> 
> I was going to ask what was so hard about it. Then I looked at your
> 
> config and see that your are in fact using NAT interception instead of
> 
> the easy way.
> 
> So what exactly do those -D options cause the Java applications to do
> 
> with the proxy?
> 
> I have some suspicions, but am not familiar enough with Java API and
> 
> the specific details are critical to what you need the proxy to be doing.
> 
> > The problem is that none of the calls get cached: All rows in the access.log
> > 
> > hava a TCP_MISS/200 tag in them.
> > 
> > I've searched all through the web for a solution to this, and have tried
> > 
> > everything people have suggested. So I was hoping someone could help me?
> > 
> > Anyone have any tips on what to try?
> 
> There are three ways to do this:
> 
> 1.  if you own the domain the apps are connecting to. Setup the proxy as
>     
>     a normal TLS / HTTPS reverse-proxy.
>     
> 2.  if you have enough control of the apps to get them connecting with
>     
>     TLS to the proxy and sending their requests there. Do that.
>     
> 3.  the (relatively) complicated SSL-Bump way you found. The proxy is
>     
>     fully at the mercy of the the messages sent by apps and servers. Caching
>     
>     is a luxury here, easily broken / prevented.
>     
>     Well, there is a forth way with intercept. But that is a VERY last
>     
>     resort and you already have (3) going and that is already better than
>     
>     intercept. Getting to (1) or (2) would be simplest if you meet the "if
>     
>     ..." requirements for those.
>     
> 
> > MY config (note Ive set the refresh_pattern like that just to see if I could
> > 
> > catch anything. The plan is to modify it so it actualyl does refresh the
> > 
> > responses frmo the web calls in 5-10 seconds intervals. There are commented
> > 
> > out pats Ive tried with no luck there too):
> 
> ...
> 
> Ah. The way you write that implies a misunderstanding about refresh_pattern.
> 
> HTTP has some fixed algorithms written into the protocol that caches are
> 
> required to perform to determine if any object stored can be used or
> 
> requires replacement.
> 
> The parameters used by these algorithms come in the form of headers in
> 
> the originally stored reply message, the current clients request.
> 
> Sometimes they require revalidation, which is a quick check with the
> 
> server for updated instructions and/or content.
> 
> What refresh_pattern actually does is provide default values for those
> 
> algorithm parameters IF any one (or more) of them are missing from those
> 
> HTTP messages.
> 
> The proper way to make caching happen with your desired behaviour is for
> 
> the server to present HTTP Cache-Control header saying the object is
> 
> cacheable (ie does not forbid caching), but not for more than 10seconds.
> 
> Cache-Control: max-age=10
> 
> OR to say that objects need revalidation, but presents a 304 status for
> 
> revalidation checks. (ie Cache-Control:no-cache) (yeah, thats right,
> 
> "no-cache" means do cache).
> 
> That said, I doubt you really are wanting to force that and would be
> 
> happy if the server was instructing the the proxy as being safe to cache
> 
> an object for several minutes or any value larger than 10sec.
> 
> So what we circle back to is that you are probably trying to force
> 
> things to cache and be used long past their actual safe-to-use lifetimes
> 
> as specified by the devs most authoritative on that subject (under
> 
> 10sec?). As you should be aware, this is highly unsafe thing to be doing
> 
> unless you are one of those devs - be very careful what you choose to do.
> 
> > Squid normally listens to port 3128
> > ===================================
> > 
> > #http_port 3128 ssl-bump generate-host-certificates=on
> > 
> > dynamic_cert_mem_cache_size=4MB cert=/cygdrive/c/squid/etc/squid/correct.pem
> > 
> > key=/cygdrive/c/squid/etc/squid/ssl/myca.key
> > 
> > http_port 3128 ssl-bump generate-host-certificates=on
> > 
> > dynamic_cert_mem_cache_size=4MB
> > 
> > cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
> > 
> > key=/cygdrive/c/squid/etc/squid/proxyCA.pem
> > 
> > #https_port 3129 cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
> > 
> > key=/cygdrive/c/squid/etc/squid/proxyCA.pem
> 
> Hmm. This is a Windows machine running Cygwin?
> 
> FYI: Performance is going to be terrible. It may not be super relevant
> 
> yet. Just be aware that Windows imposes limitations on usable sockets
> 
> per application - which is much smaller than a typical proxy requires.
> 
> The Cygwin people do a lot but they cannot solve some OS limitation
> 
> problems.
> 
> To meet your very first sentence "as many as possible" requirement you
> 
> will need a non-Windows machine to run the proxy on. That simple change
> 
> will get you something around 3 orders of magnitude higher peak client
> 
> capacity on the proxy.
> 
> > Uncomment the line below to enable disk caching - path format is
> > ================================================================
> > 
> > /cygdrive/<full path to cache folder>, i.e.
> > 
> > #cache_dir aufs /cygdrive/c/squid/var/cache/ 3000 16 256
> > 
> > certificate generation program
> > ==============================
> > 
> > sslcrtd_program /cygdrive/c/squid/lib/squid/ssl_crtd -s
> > 
> > /cygdrive/c/squid/var/cache/squid_ssldb -M 4MB
> > 
> > Leave coredumps in the first cache dir
> > ======================================
> > 
> > coredump_dir /var/cache/squid
> > 
> > Add any of your own refresh_pattern entries above these.
> > ========================================================
> > 
> > #refresh_pattern ^ftp: 1440 20% 10080
> > 
> > #refresh_pattern ^gopher: 1440 0% 1440
> > 
> > #refresh_pattern -i (/cgi-bin/|?) 0 0% 0
> > 
> > #refresh_pattern -i (/cgi-bin/|?) 1440 100% 4320 ignore-no-store
> > 
> > override-lastmod override-expire ignore-must-revalidate ignore-reload
> > 
> > ignore-private ignore-auth
> > 
> > refresh_pattern . 1440 100% 4320 ignore-no-store override-lastmod
> > 
> > override-expire ignore-must-revalidate ignore-reload ignore-private
> > 
> > ignore-auth override-lastmod
> 
> -   ignore-must-revalidate actively reduces caching. Because it disables
>     
>     several of the widely used HTTP mechanisms that rely on revalidation to
>     
>     allow things to be stored in a cache.
>     
>     It is only beneficial if the server is broken; requiring revalidation
>     
>     plus not supporting revalidation.
>     
> -   ignore-auth same un-intuitive effects as ignoring revalidation, again
>     
>     reducing caching ability.
>     
>     This is only useful if you want to prevent caching of contents which
>     
>     require any form of login to view. High security networks dealing with
>     
>     classified or confidential materials find this useful - regular Internet
>     
>     admin not so much.
>     
> -   ignore-no-store is highly dangerous and rarely necessary. The "nuclear
>     
>     option" for caching. It has the potential to eradicate user privacy and
>     
>     scramble up any server personalized content (not in a good way).
>     
>     This is a last resort intended only to copy with severely braindead
>     
>     applications. YMMV whether you have to deal with any of those - just
>     
>     treat this an absolute last resort rather than something to play with.
>     
>     Overall - in order to use these refresh-pattern controls you need to
>     
>     know what the HTTP(S) messages going through your proxy contain in terms
>     
>     of caching headers AND what those messages are doing semantically /
>     
>     content wise for the client application. Using any of them as a generic
>     
>     "makes caching better" thing only leads to problems in todays HTTP protocol.
>     
> 
> > Bumped requests have relative URLs so Squid has to use reverse proxy
> > ====================================================================
> > 
> > or accelerator code. By default, that code denies direct forwarding.
> > ====================================================================
> > 
> > The need for this option may disappear in the future.
> > =====================================================
> > 
> > #always_direct allow all
> > 
> > dns_nameservers 8.8.8.8 208.67.222.222
> 
> Use of 8.8.8.8 is known to be explicitly detrimental to caching
> 
> intercepted traffic.
> 
> Those servers present different result sets based on the timing and IP
> 
> sending the query. The #1 requirement of caching intercepted (or
> 
> SSL-Bump'ed) content is that the client and proxy have the exact same
> 
> view of DNS system contents. Having the DNS reply contents change
> 
> between two consecutive and identical queries breaks that requirement.
> 
> > max_filedescriptors 3200
> > 
> > Max Object Size Cache
> > =====================
> > 
> > maximum_object_size 10240 KB
> > 
> > acl step1 at_step SslBump1
> > 
> > ssl_bump peek step1
> > 
> > ssl_bump bump all
> 
> This causes the proxy to attempt decryption of the traffic using crypto
> 
> algorithms based solely on the ClientHello details and its own
> 
> capabilities. There is zero server crypto capabilities known for the
> 
> proxy to use to ensure traffic can actually make it to the server.
> 
> You are rather lucky that it actually worked at all. Almost any
> 
> deviation (ie emergency security updates in future) at either client or
> 
> server or proxy endpoints risks breaking the communication through this
> 
> proxy.
> 
> Ideally there would be a stare action for step2 and them bump only at
> 
> step 3.
> 
> So in summary to the things to try to get better caching:
> 
> -   ditch 8.8.8.8. Use a local DNS resolver within your own network,
>     
>     shared by clients and proxy. That can use 8.8.8.8 itself, the important
>     
>     part is that it should be responsible for caching DNS results and
>     
>     ensuring the app clients and Squid see as much the same records as possible.
>     
> -   try "debug_options 11,2" to get a cache.log of the HTTP(S) headers for
>     
>     message being decrypted in the proxy. Look at those headers to see why
>     
>     they are not caching normally. Use that info to inform your next
>     
>     actions. It cannot tell you how the message is used by the application,
>     
>     hopefully you can figure that out somehow before forcing anything unnatural.
>     
> -   if you can, try pasting some of the transaction URLs into the tool at
>     
>     redbot.org to see if there are any HTTP level mistakes in the apps that
>     
>     could be fixed for better cacheability.
>     
>     Amos

Very much thanks for this very informative post to my question! I will spend some time understanding it, and try out the things you suggest!
Thanks again!

_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users