On 14/06/18 07:28, baretomas wrote: > Hello, > > I'm setting up a Squid proxy as a cache for a number (as many as possible) > of identical JAVA applications to run their web calls through. The calls are > ofc identical, and the response they get can safely be cached for 5-10 > seconds. > I do this because most of the calls is directed at a single server on the > internet that I don't want to hammer, since I will ofc be locked out of it > then. > > Currently Im simply testing this on a single computer: the application and > squid > > The calls from the application is done using ssl / https by telling java to > use Squid as a proxy (-Dhttps.proxyHost and -Dhttp.proxyHost). I've set up > squid and JAVA with self-signed certificates, and the application sends its > calls through squid and gets the reponse. No problem there (wasnt easy that > either I must say :P ). I was going to ask what was so hard about it. Then I looked at your config and see that your are in fact using NAT interception instead of the easy way. So what _exactly_ do those -D options cause the Java applications to do with the proxy? I have some suspicions, but am not familiar enough with Java API and the specific details are critical to what you need the proxy to be doing. > > The problem is that none of the calls get cached: All rows in the access.log > hava a TCP_MISS/200 tag in them. > > I've searched all through the web for a solution to this, and have tried > everything people have suggested. So I was hoping someone could help me? > > Anyone have any tips on what to try? > There are three ways to do this: 1) if you own the domain the apps are connecting to. Setup the proxy as a normal TLS / HTTPS reverse-proxy. 2) if you have enough control of the apps to get them connecting with TLS *to the proxy* and sending their requests there. Do that. 3) the (relatively) complicated SSL-Bump way you found. The proxy is fully at the mercy of the the messages sent by apps and servers. Caching is a luxury here, easily broken / prevented. Well, there is a forth way with intercept. But that is a VERY last resort and you already have (3) going and that is already better than intercept. Getting to (1) or (2) would be simplest if you meet the "if ..." requirements for those. > MY config (note Ive set the refresh_pattern like that just to see if I could > catch anything. The plan is to modify it so it actualyl does refresh the > responses frmo the web calls in 5-10 seconds intervals. There are commented > out pats Ive tried with no luck there too): > ... Ah. The way you write that implies a misunderstanding about refresh_pattern. HTTP has some fixed algorithms written into the protocol that caches are required to perform to determine if any object stored can be used or requires replacement. The parameters used by these algorithms come in the form of headers in the originally stored reply message, the current clients request. Sometimes they require revalidation, which is a quick check with the server for updated instructions and/or content. What refresh_pattern actually does is provide default values for those algorithm parameters IF any one (or more) of them are missing from those HTTP messages. The proper way to make caching happen with your desired behaviour is for the server to present HTTP Cache-Control header saying the object is cacheable (ie does not forbid caching), but not for more than 10seconds. Cache-Control: max-age=10 OR to say that objects need revalidation, but presents a 304 status for revalidation checks. (ie Cache-Control:no-cache) (yeah, thats right, "no-cache" means *do* cache). That said, I doubt you really are wanting to force that and would be happy if the server was instructing the the proxy as being safe to cache an object for several minutes or any value larger than 10sec. So what we circle back to is that you are probably trying to force things to cache and be used long past their actual safe-to-use lifetimes as specified by the devs most authoritative on that subject (under 10sec?). As you should be aware, this is highly unsafe thing to be doing unless you are one of those devs - be very careful what you choose to do. > > > # Squid normally listens to port 3128 > #http_port 3128 ssl-bump generate-host-certificates=on > dynamic_cert_mem_cache_size=4MB cert=/cygdrive/c/squid/etc/squid/correct.pem > key=/cygdrive/c/squid/etc/squid/ssl/myca.key > > http_port 3128 ssl-bump generate-host-certificates=on > dynamic_cert_mem_cache_size=4MB > cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem > key=/cygdrive/c/squid/etc/squid/proxyCA.pem > > #https_port 3129 cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem > key=/cygdrive/c/squid/etc/squid/proxyCA.pem > Hmm. This is a Windows machine running Cygwin? FYI: Performance is going to be terrible. It may not be super relevant yet. Just be aware that Windows imposes limitations on usable sockets per application - which is much smaller than a typical proxy requires. The Cygwin people do a lot but they cannot solve some OS limitation problems. To meet your very first sentence "as many as possible" requirement you will need a non-Windows machine to run the proxy on. That simple change will get you something around 3 orders of magnitude higher peak client capacity on the proxy. > > # Uncomment the line below to enable disk caching - path format is > /cygdrive/<full path to cache folder>, i.e. > #cache_dir aufs /cygdrive/c/squid/var/cache/ 3000 16 256 > > # certificate generation program > sslcrtd_program /cygdrive/c/squid/lib/squid/ssl_crtd -s > /cygdrive/c/squid/var/cache/squid_ssldb -M 4MB > > # Leave coredumps in the first cache dir > coredump_dir /var/cache/squid > > # Add any of your own refresh_pattern entries above these. > #refresh_pattern ^ftp: 1440 20% 10080 > #refresh_pattern ^gopher: 1440 0% 1440 > #refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 > #refresh_pattern -i (/cgi-bin/|\?) 1440 100% 4320 ignore-no-store > override-lastmod override-expire ignore-must-revalidate ignore-reload > ignore-private ignore-auth > refresh_pattern . 1440 100% 4320 ignore-no-store override-lastmod > override-expire ignore-must-revalidate ignore-reload ignore-private > ignore-auth override-lastmod > * ignore-must-revalidate actively *reduces* caching. Because it disables several of the widely used HTTP mechanisms that rely on revalidation to allow things to be stored in a cache. It is *only* beneficial if the server is broken; requiring revalidation plus not supporting revalidation. * ignore-auth same un-intuitive effects as ignoring revalidation, again reducing caching ability. This is only useful if you want to prevent caching of contents which require any form of login to view. High security networks dealing with classified or confidential materials find this useful - regular Internet admin not so much. * ignore-no-store is highly dangerous and rarely necessary. The "nuclear option" for caching. It has the potential to eradicate user privacy and scramble up any server personalized content (not in a good way). This is a last resort intended only to copy with severely braindead applications. YMMV whether you have to deal with any of those - just treat this an absolute last resort rather than something to play with. Overall - in order to use these refresh-pattern controls you *need* to know what the HTTP(S) messages going through your proxy contain in terms of caching headers AND what those messages are doing semantically / content wise for the client application. Using any of them as a generic "makes caching better" thing only leads to problems in todays HTTP protocol. > # Bumped requests have relative URLs so Squid has to use reverse proxy > # or accelerator code. By default, that code denies direct forwarding. > # The need for this option may disappear in the future. > #always_direct allow all > > dns_nameservers 8.8.8.8 208.67.222.222 Use of 8.8.8.8 is known to be explicitly detrimental to caching intercepted traffic. Those servers present different result sets based on the timing and IP sending the query. The #1 requirement of caching intercepted (or SSL-Bump'ed) content is that the client and proxy have the exact same view of DNS system contents. Having the DNS reply contents change between two consecutive and identical queries breaks that requirement. > > max_filedescriptors 3200 > > # Max Object Size Cache > maximum_object_size 10240 KB > > > acl step1 at_step SslBump1 > > ssl_bump peek step1 > ssl_bump bump all This causes the proxy to attempt decryption of the traffic using crypto algorithms based solely on the ClientHello details and its own capabilities. There is zero server crypto capabilities known for the proxy to use to ensure traffic can actually make it to the server. You are rather lucky that it actually worked at all. Almost any deviation (ie emergency security updates in future) at either client or server or proxy endpoints risks breaking the communication through this proxy. Ideally there would be a stare action for step2 and them bump only at step 3. So in summary to the things to try to get better caching: * ditch 8.8.8.8. Use a local DNS resolver within your own network, shared by clients and proxy. That can use 8.8.8.8 itself, the important part is that it should be responsible for caching DNS results and ensuring the app clients and Squid see as much the same records as possible. * try "debug_options 11,2" to get a cache.log of the HTTP(S) headers for message being decrypted in the proxy. Look at those headers to see why they are not caching normally. Use that info to inform your next actions. It cannot tell you how the message is used by the application, hopefully you can figure that out somehow before forcing anything unnatural. * if you can, try pasting some of the transaction URLs into the tool at redbot.org to see if there are any HTTP level mistakes in the apps that could be fixed for better cacheability. Amos _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users