On 28/10/2015 2:05 p.m., Jester Purtteman wrote: > So, here is the problem: I want to cache the images on craigslist. The > headers all look thoroughly cacheable, some browsers (I'm glairing at you > Chrome) send with this thing that requests that they not be cachable, "this thing" being what exactly? I am aware of several nasty things Chrome sends that interfere with optimal HTTP use. But nothing that directly prohibits caching like you describe. > but > craigslist replies anyway and says "sure thing! Cache that sucker!" and > firefox doesn't even do that. An example of URL: > http://images.craigslist.org/00o0o_3fcu92TR5jB_600x450.jpg > > > > The request headers look like: > > Host: images.craigslist.org > > User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:41.0) Gecko/20100101 > Firefox/41.0 > > Accept: image/png,image/*;q=0.8,*/*;q=0.5 > > Accept-Language: en-US,en;q=0.5 > > Accept-Encoding: gzip, deflate > > Referer: http://seattle.craigslist.org/oly/hvo/5288435732.html > > Cookie: cl_tocmode=sss%3Agrid; cl_b=hlJExhZ55RGzNupTXAYJOAIcZ80; > cl_def_lang=en; cl_def_hp=seattle > > Connection: keep-alive > > > > The response headers are: > > Cache-Control: public, max-age=2592000 <-- doesn't that say "keep that a > very long time"? > Not exactly. It says only that you are *allowed* to store it for 30 days. Does not say you have to. Your refresh_pattern rules will use that as the 'max' limit along with the below Date+Last-Modified header values when determining whether the response can be cached, and for how long. > Content-Length: 49811 > > Content-Type: image/jpeg > > Date: Tue, 27 Oct 2015 23:04:14 GMT > > Last-Modified: Tue, 27 Oct 2015 23:04:14 GMT > > Server: craigslist/0 > > > > Access log says: > 1445989120.714 265 192.168.2.56 TCP_MISS/200 50162 GET > http://images.craigslist.org/00Y0Y_kMkjOhL1Lim_600x450.jpg - > ORIGINAL_DST/208.82.236.227 image/jpeg > This is intercepted traffic. I've run some tests on that domain and it is another one presenting only 1 single IP address on DNS results, but rotating through a whole set in the background depending on from where it gets queried. As a result different machines get different results. What we found just the other day was that domains doing this have big problems when queried through Google DNS servers. Due to the way Google DNS servers are spread around the world and load balancing their traffic these sites can return different IPs on each and very lookup. The final outcome of all that is when Squid tries to verify the intercepted traffic was actually going where the client intended, it cannot confirm the ORIGINAL_DST server IP is one belonging to the Host header domain. The solution is to setup a DNS resolver in your network and use that instead of the Google DNS. You may have to divert clients DNS queries to it if they try to go to Google DNS still. The result will be much more cacheable traffic and probably faster DNS as well. > > And Store Log says: > 1445989120.714 RELEASE -1 FFFFFFFF 27C2B2CEC9ACCA05A31E80479E5F0E9C ? > ? ? ? ?/? ?/? ? ? > > > > I started out with a configuration from here: > http://wiki.sebeka.k12.mn.us/web_services:squid_update_cache but have made a > lot of tweaks to it. In fact, I've dropped all the updates, all the > rewrite, store id, and a lot of other stuff. I've set cache allow all > (which, I suspect I can simply leave blank, but I don't know) I've cut it > down quite a bit, the one I am testing right now for example looks like > this: > > > > My squid.conf (which has been hacked mercilously trying stuff, admittedly) > looks like this: > > > > <BEGIN SQUID.CONF > > > acl localnet src 10.0.0.0/8 # RFC1918 possible internal network > > acl localnet src 172.16.0.0/12 # RFC1918 possible internal network > > acl localnet src 192.168.0.0/16 # RFC1918 possible internal network > > acl localnet src fc00::/7 # RFC 4193 local private network range > > acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged) > machines > > > > acl SSL_ports port 443 > > acl Safe_ports port 80 # http > > acl Safe_ports port 21 # ftp > > acl Safe_ports port 443 # https > > acl Safe_ports port 70 # gopher > > acl Safe_ports port 210 # wais > > acl Safe_ports port 1025-65535 # unregistered ports > > acl Safe_ports port 280 # http-mgmt > > acl Safe_ports port 488 # gss-http > > acl Safe_ports port 591 # filemaker > > acl Safe_ports port 777 # multiling http > > acl CONNECT method CONNECT > > You are missing the default security http_access lines. They should be re-instated even on intercepted traffic. acl SSL_Ports port 443 http_access deny !Safe_ports http_access deny CONNECT !SSL_Ports > > http_access allow localnet > > http_access allow localhost > > > > # And finally deny all other access to this proxy > > http_access deny all > > > > http_port 3128 > > http_port 3129 tproxy > Okay, assuming you have the proper iptables/ip6tables TPROXY rules setup to accompany it. > > cache_dir aufs /var/spool/squid/ 40000 32 256 > > > > cache_swap_low 90 > > cache_swap_high 95 > > > > dns_nameservers 8.8.8.8 8.8.4.4 > See above. > > > cache allow all Not useful. That is the default action when "cache" directive is nomitted entirely. > > maximum_object_size 8000 MB > > range_offset_limit 8000 MB > > quick_abort_min 512 KB > > > > cache_store_log /var/log/squid/store.log > > access_log daemon:/var/log/squid/access.log squid > > cache_log /var/log/squid/cache.log > > coredump_dir /var/spool/squid > > > > max_open_disk_fds 8000 > > > > vary_ignore_expire on > The above should not be doing anything in current Squid which are HTTP/1.1 compliant. It is just a directive we have forgotten to remove. > request_entities on > > > > refresh_pattern -i .*\.(gif|png|jpg|jpeg|ico|webp)$ 10080 100% 43200 > ignore-no-store ignore-private ignore-reload store-stale > > refresh_pattern ^ftp: 1440 20% 10080 > > refresh_pattern ^gopher: 1440 0% 1440 > > refresh_pattern -i .*\.index.(html|htm)$ 2880 40% 10080 > > refresh_pattern -i .*\.(html|htm|css|js)$ 120 40% 1440 > > refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 > > refresh_pattern . 0 40% 40320 > > > > cache_mgr <my address> > > cache_effective_user proxy > > cache_effective_group proxy > > > > <END SQUID.CONF> > > > > There is a good deal of hacking that has gone into this configuration, and I > accept that this will eventually be gutted and replaced with something less, > broken. It is surprisingly good for all that :-) > Where I am pulling my hair out is trying to figure out why things > are cached and then not cached. That top refresh line (the one looking for > jpg, gifs etc) has taken many forms, and I am getting inconsistent results. > The above image will cache just fine, a couple times, but if I go back, > clear the cache on the browser, close out, restart and reload, it releases > the link and never again shall it cache. What is worse, it appears to get > getting worse over time until it isn't really picking up much of anything. > What starts out as a few missed entries piles up into a huge list of cache > misses over time. > What Squid version is this? 0.1% seems to be extremely low. Even for a proxy having those Google DNS problems. > > > Right now, I am running somewhere in the 0.1% hits rate, and I can only > assume I have buckled something in all the compile and re-compiles, and > reconfigurations. What started out as "gee, I wonder if I can cache > updates" has turned into quite the rabbit hole! > > > > So, big question, what debug level do I use to see this thing making > decisions on whether to cache, and any tips anyone has about this would be > appreciated. Thank you! debug_options 85,3 22,3 Amos _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users