So, here is the problem: I want to cache the images on craigslist. The headers all look thoroughly cacheable, some browsers (I’m glairing at you Chrome) send with this thing that requests that they not be cachable, but craigslist replies anyway and says “sure thing! Cache that sucker!” and firefox doesn’t even do that. An example of URL: http://images.craigslist.org/00o0o_3fcu92TR5jB_600x450.jpg The request headers look like: Host: images.craigslist.org User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0 Accept: image/png,image/*;q=0.8,*/*;q=0.5 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Referer: http://seattle.craigslist.org/oly/hvo/5288435732.html Cookie: cl_tocmode=sss%3Agrid; cl_b=hlJExhZ55RGzNupTXAYJOAIcZ80; cl_def_lang=en; cl_def_hp=seattle Connection: keep-alive The response headers are: Cache-Control: public, max-age=2592000 ß doesn’t that say “keep that a very long time”? Content-Length: 49811 Content-Type: image/jpeg Date: Tue, 27 Oct 2015 23:04:14 GMT Last-Modified: Tue, 27 Oct 2015 23:04:14 GMT Server: craigslist/0 Access log says: And Store Log says: I started out with a configuration from here: http://wiki.sebeka.k12.mn.us/web_services:squid_update_cache but have made a lot of tweaks to it. In fact, I’ve dropped all the updates, all the rewrite, store id, and a lot of other stuff. I’ve set cache allow all (which, I suspect I can simply leave blank, but I don’t know) I’ve cut it down quite a bit, the one I am testing right now for example looks like this: My squid.conf (which has been hacked mercilously trying stuff, admittedly) looks like this: <BEGIN SQUID.CONF > acl localnet src 10.0.0.0/8 # RFC1918 possible internal network acl localnet src 172.16.0.0/12 # RFC1918 possible internal network acl localnet src 192.168.0.0/16 # RFC1918 possible internal network acl localnet src fc00::/7 # RFC 4193 local private network range acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged) machines acl SSL_ports port 443 acl Safe_ports port 80 # http acl Safe_ports port 21 # ftp acl Safe_ports port 443 # https acl Safe_ports port 70 # gopher acl Safe_ports port 210 # wais acl Safe_ports port 1025-65535 # unregistered ports acl Safe_ports port 280 # http-mgmt acl Safe_ports port 488 # gss-http acl Safe_ports port 591 # filemaker acl Safe_ports port 777 # multiling http acl CONNECT method CONNECT http_access allow localnet http_access allow localhost # And finally deny all other access to this proxy http_access deny all http_port 3128 http_port 3129 tproxy cache_dir aufs /var/spool/squid/ 40000 32 256 cache_swap_low 90 cache_swap_high 95 dns_nameservers 8.8.8.8 8.8.4.4 cache allow all maximum_object_size 8000 MB range_offset_limit 8000 MB quick_abort_min 512 KB cache_store_log /var/log/squid/store.log access_log daemon:/var/log/squid/access.log squid cache_log /var/log/squid/cache.log coredump_dir /var/spool/squid max_open_disk_fds 8000 vary_ignore_expire on request_entities on refresh_pattern -i .*\.(gif|png|jpg|jpeg|ico|webp)$ 10080 100% 43200 ignore-no-store ignore-private ignore-reload store-stale refresh_pattern ^ftp: 1440 20% 10080 refresh_pattern ^gopher: 1440 0% 1440 refresh_pattern -i .*\.index.(html|htm)$ 2880 40% 10080 refresh_pattern -i .*\.(html|htm|css|js)$ 120 40% 1440 refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 refresh_pattern . 0 40% 40320 cache_mgr <my address> cache_effective_user proxy cache_effective_group proxy <END SQUID.CONF> There is a good deal of hacking that has gone into this configuration, and I accept that this will eventually be gutted and replaced with something less, broken. Where I am pulling my hair out is trying to figure out why things are cached and then not cached. That top refresh line (the one looking for jpg, gifs etc) has taken many forms, and I am getting inconsistent results. The above image will cache just fine, a couple times, but if I go back, clear the cache on the browser, close out, restart and reload, it releases the link and never again shall it cache. What is worse, it appears to get getting worse over time until it isn’t really picking up much of anything. What starts out as a few missed entries piles up into a huge list of cache misses over time. Right now, I am running somewhere in the 0.1% hits rate, and I can only assume I have buckled something in all the compile and re-compiles, and reconfigurations. What started out as “gee, I wonder if I can cache updates” has turned into quite the rabbit hole! So, big question, what debug level do I use to see this thing making decisions on whether to cache, and any tips anyone has about this would be appreciated. Thank you! |
_______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users