> -----Original Message----- > From: squid-users [mailto:squid-users-bounces@xxxxxxxxxxxxxxxxxxxxx] On > Behalf Of Amos Jeffries > Sent: Tuesday, October 27, 2015 9:07 PM > To: squid-users@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: Inconsistent accessing of the cache, craigslist.org > images, wacky stuff. > > On 28/10/2015 2:05 p.m., Jester Purtteman wrote: > > So, here is the problem: I want to cache the images on craigslist. > > The headers all look thoroughly cacheable, some browsers (I'm glairing > > at you > > Chrome) send with this thing that requests that they not be cachable, > > "this thing" being what exactly? > > I am aware of several nasty things Chrome sends that interfere with optimal > HTTP use. But nothing that directly prohibits caching like you describe. > > > > but > > craigslist replies anyway and says "sure thing! Cache that sucker!" > > and firefox doesn't even do that. An example of URL: > > http://images.craigslist.org/00o0o_3fcu92TR5jB_600x450.jpg > > > > > > > > The request headers look like: > > > > Host: images.craigslist.org > > > > User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:41.0) > > Gecko/20100101 > > Firefox/41.0 > > > > Accept: image/png,image/*;q=0.8,*/*;q=0.5 > > > > Accept-Language: en-US,en;q=0.5 > > > > Accept-Encoding: gzip, deflate > > > > Referer: http://seattle.craigslist.org/oly/hvo/5288435732.html > > > > Cookie: cl_tocmode=sss%3Agrid; cl_b=hlJExhZ55RGzNupTXAYJOAIcZ80; > > cl_def_lang=en; cl_def_hp=seattle > > > > Connection: keep-alive > > > > > > > > The response headers are: > > > > Cache-Control: public, max-age=2592000 <-- doesn't that say "keep > > that a very long time"? > > > > Not exactly. It says only that you are *allowed* to store it for 30 days. Does > not say you have to. > > Your refresh_pattern rules will use that as the 'max' limit along with the > below Date+Last-Modified header values when determining whether the > response can be cached, and for how long. > > > > Content-Length: 49811 > > > > Content-Type: image/jpeg > > > > Date: Tue, 27 Oct 2015 23:04:14 GMT > > > > Last-Modified: Tue, 27 Oct 2015 23:04:14 GMT > > > > Server: craigslist/0 > > > > > > > > Access log says: > > 1445989120.714 265 192.168.2.56 TCP_MISS/200 50162 GET > > http://images.craigslist.org/00Y0Y_kMkjOhL1Lim_600x450.jpg - > > ORIGINAL_DST/208.82.236.227 image/jpeg > > > > This is intercepted traffic. > > I've run some tests on that domain and it is another one presenting only > 1 single IP address on DNS results, but rotating through a whole set in the > background depending on from where it gets queried. As a result different > machines get different results. > > > What we found just the other day was that domains doing this have big > problems when queried through Google DNS servers. Due to the way Google > DNS servers are spread around the world and load balancing their traffic > these sites can return different IPs on each and very lookup. > > The final outcome of all that is when Squid tries to verify the intercepted > traffic was actually going where the client intended, it cannot confirm the > ORIGINAL_DST server IP is one belonging to the Host header domain. > > > The solution is to setup a DNS resolver in your network and use that instead > of the Google DNS. You may have to divert clients DNS queries to it if they try > to go to Google DNS still. The result will be much more cacheable traffic and > probably faster DNS as well. > > > > > > And Store Log says: > > 1445989120.714 RELEASE -1 FFFFFFFF > 27C2B2CEC9ACCA05A31E80479E5F0E9C ? > > ? ? ? ?/? ?/? ? ? > > > > > > > > I started out with a configuration from here: > > http://wiki.sebeka.k12.mn.us/web_services:squid_update_cache but > have > > made a lot of tweaks to it. In fact, I've dropped all the updates, > > all the rewrite, store id, and a lot of other stuff. I've set cache > > allow all (which, I suspect I can simply leave blank, but I don't > > know) I've cut it down quite a bit, the one I am testing right now > > for example looks like > > this: > > > > > > > > My squid.conf (which has been hacked mercilously trying stuff, > > admittedly) looks like this: > > > > > > > > <BEGIN SQUID.CONF > > > > > acl localnet src 10.0.0.0/8 # RFC1918 possible internal network > > > > acl localnet src 172.16.0.0/12 # RFC1918 possible internal network > > > > acl localnet src 192.168.0.0/16 # RFC1918 possible internal network > > > > acl localnet src fc00::/7 # RFC 4193 local private network range > > > > acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged) > > machines > > > > > > > > acl SSL_ports port 443 > > > > acl Safe_ports port 80 # http > > > > acl Safe_ports port 21 # ftp > > > > acl Safe_ports port 443 # https > > > > acl Safe_ports port 70 # gopher > > > > acl Safe_ports port 210 # wais > > > > acl Safe_ports port 1025-65535 # unregistered ports > > > > acl Safe_ports port 280 # http-mgmt > > > > acl Safe_ports port 488 # gss-http > > > > acl Safe_ports port 591 # filemaker > > > > acl Safe_ports port 777 # multiling http > > > > acl CONNECT method CONNECT > > > > > > You are missing the default security http_access lines. They should be re- > instated even on intercepted traffic. > > acl SSL_Ports port 443 > > http_access deny !Safe_ports > http_access deny CONNECT !SSL_Ports > > > > > > > http_access allow localnet > > > > http_access allow localhost > > > > > > > > # And finally deny all other access to this proxy > > > > http_access deny all > > > > > > > > http_port 3128 > > > > http_port 3129 tproxy > > > > Okay, assuming you have the proper iptables/ip6tables TPROXY rules setup > to accompany it. > > > > > > cache_dir aufs /var/spool/squid/ 40000 32 256 > > > > > > > > cache_swap_low 90 > > > > cache_swap_high 95 > > > > > > > > dns_nameservers 8.8.8.8 8.8.4.4 > > > > See above. > > > > > > > cache allow all > > Not useful. That is the default action when "cache" directive is nomitted > entirely. > > > > > maximum_object_size 8000 MB > > > > range_offset_limit 8000 MB > > > > quick_abort_min 512 KB > > > > > > > > cache_store_log /var/log/squid/store.log > > > > access_log daemon:/var/log/squid/access.log squid > > > > cache_log /var/log/squid/cache.log > > > > coredump_dir /var/spool/squid > > > > > > > > max_open_disk_fds 8000 > > > > > > > > vary_ignore_expire on > > > > The above should not be doing anything in current Squid which are > HTTP/1.1 compliant. It is just a directive we have forgotten to remove. > > > request_entities on > > > > > > > > refresh_pattern -i .*\.(gif|png|jpg|jpeg|ico|webp)$ 10080 100% 43200 > > ignore-no-store ignore-private ignore-reload store-stale > > > > refresh_pattern ^ftp: 1440 20% 10080 > > > > refresh_pattern ^gopher: 1440 0% 1440 > > > > refresh_pattern -i .*\.index.(html|htm)$ 2880 40% 10080 > > > > refresh_pattern -i .*\.(html|htm|css|js)$ 120 40% 1440 > > > > refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 > > > > refresh_pattern . 0 40% 40320 > > > > > > > > cache_mgr <my address> > > > > cache_effective_user proxy > > > > cache_effective_group proxy > > > > > > > > <END SQUID.CONF> > > > > > > > > There is a good deal of hacking that has gone into this configuration, > > and I accept that this will eventually be gutted and replaced with > > something less, broken. > > It is surprisingly good for all that :-) > > > > Where I am pulling my hair out is trying to figure out why things are > > cached and then not cached. That top refresh line (the one looking > > for jpg, gifs etc) has taken many forms, and I am getting inconsistent > results. > > The above image will cache just fine, a couple times, but if I go > > back, clear the cache on the browser, close out, restart and reload, > > it releases the link and never again shall it cache. What is worse, > > it appears to get getting worse over time until it isn't really picking up much > of anything. > > What starts out as a few missed entries piles up into a huge list of > > cache misses over time. > > > > What Squid version is this? 0.1% seems to be extremely low. Even for a proxy > having those Google DNS problems. > > > > > > > Right now, I am running somewhere in the 0.1% hits rate, and I can > > only assume I have buckled something in all the compile and > > re-compiles, and reconfigurations. What started out as "gee, I wonder > > if I can cache updates" has turned into quite the rabbit hole! > > > > > > > > So, big question, what debug level do I use to see this thing making > > decisions on whether to cache, and any tips anyone has about this > > would be appreciated. Thank you! > > debug_options 85,3 22,3 > > > Amos > _______________________________________________ > squid-users mailing list > squid-users@xxxxxxxxxxxxxxxxxxxxx > http://lists.squid-cache.org/listinfo/squid-users Well that (debug_options 85,3 22,3) worked like a charm! I had the info I needed in about two seconds flat! I am getting: 2015/10/28 09:16:54.075| 85,3| client_side_request.cc(532) hostHeaderIpVerify: FAIL: validate IP 208.82.238.226:80 possible from Host: 2015/10/28 09:16:54.075| 85,3| client_side_request.cc(543) hostHeaderVerifyFailed: SECURITY ALERT: Host header forgery detected on local=208.82.238.226:80 remote=192.168.2.56 FD 20 flags=17 (local IP does not match any domain IP) on URL: http://seattle.craigslist.org/favicon.ico Based on http://wiki.squid-cache.org/KnowledgeBase/HostHeaderForgery I believe this is saying the IP address requested and the one Squid found are not the same. Bottom line, I think it is time for me to host a DNS server, that way at least the request IP and the squid IP will be more consistent. It looks like this won't actually completely fix the issue, it is just a problem with transparent proxies. Time to read up on autoconfiguration of proxies it appears. Thank you again! _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users