On Sun, 26 Jul 2009 07:37:19 -0700 (PDT), RynoDJ <rdejager@xxxxxxxxxx> wrote: > Hi, > > As a rule squid works pretty well, except that it seems to 'loose' objects > from the cache when the machine is restarted and or after a few hours/days. > It then re-downloads files that have not changed. All I need is a simple > setup that will cache as much as possible (use as much of the cache size as > possible) and only download files when they've changed. I do lots of Linux > re-installs inside VMs and I'd like to source updates from the cache > instead > of downloading the same RPMs over and over again. This sounds like normal proxy behaviour. Objects do not have unlimited existences. * Web servers often send information indicating when objects are to be replaced, and if not Squid makes a guess and checks for new ones. * Some Servers send back a whole new object whether they need to or not when Squid merely asks if its changed. * The amount of space you have available is not unlimited, garbage collection normally throws out old objects which may or may not be usable when more space is needed. * Forcing a fast shutdown/restart and/or lack of disk space can prevent squid saving in-memory objects to disk. Affected objects are thus lost until re-fetched. * Some versions of Squid (older than mid-2.6 era, and all Squid-3) do not handle variants (ie compressed or non-compressed) versions of objects well, and will discard the stored copy if a new variant is needed. * I've heard a few people mention that RPMs come from changing URLs, this will cause a re-fetch for each unique URL where it exists. The storeurl_rewrite feature from 2.7 is needed to evade that problem. If you want to know why squid is not saving a particular URL, visit www.redbot.org and enter the URL there (needs to be publicly available). If the report there indicates it should be cacheable when its actually thrown away, look closer at your logs for a reason why its being discarded. > > Could someone perhaps tell me what I need to change in my conf file? > > > Thanks > > > http_port 3128 transparent I advise not using port 3128 for interception. The regular proxy traffic coming in will be trying to do NAT lookups and URL changes all the time. See CVE-2009-0801 for the security issues. > hierarchy_stoplist cgi-bin ? > acl QUERY urlpath_regex cgi-bin \? > no_cache deny QUERY Assuming you have Squid-2.6 or higher: remove the two QUERY lines above. > cache_replacement_policy heap LFUDA The above means that objects (stale or not) which have not been asked for longest will be removed on garbage collection. This may be related to the object loss you are noticing. It may be worth looking up the meanings of this and alternatives to see which best matches what you want. > cache_dir diskd /var/spool/squid 10240 16 256 > cache_store_log none > auth_param basic children 5 > auth_param basic realm Squid proxy-caching web server > auth_param basic credentialsttl 2 hours Authentication will not work for intercepted requests (ie anything arriving on a "transparent" marked port). It also appears not to be used by your controls. You can save yourself some startup/shutdown delays by removing the above. > refresh_pattern ^ftp: 1440 20% 10080 > refresh_pattern ^gopher: 1440 0% 1440 Add a new pattern here to help with dynamic objects: refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 > refresh_pattern . 0 20% 4320 For more aggressive caching you can also add "reload-into-ims" to the "." pattern if your squid supports it. > half_closed_clients off > acl all src 0.0.0.0/0.0.0.0 > acl manager proto cache_object > acl localhost src 127.0.0.1/255.255.255.255 > acl to_localhost dst 127.0.0.0/8 > acl localnet src 10.0.0.0/8 # RFC1918 possible internal network > acl localnet src 172.16.0.0/12 # RFC1918 possible internal network > acl localnet src 192.168.0.0/16 # RFC1918 possible internal network > acl SSL_ports port 443 563 > acl Safe_ports port 80 # http > acl Safe_ports port 21 # ftp > acl Safe_ports port 443 563 # https, snews > acl Safe_ports port 70 # gopher > acl Safe_ports port 210 # wais > acl Safe_ports port 1025-65535 # unregistered ports > acl Safe_ports port 280 # http-mgmt > acl Safe_ports port 488 # gss-http > acl Safe_ports port 591 # filemaker > acl Safe_ports port 777 # multiling http > acl CONNECT method CONNECT > http_access allow manager localhost > http_access deny manager > http_access deny !Safe_ports > http_access deny CONNECT !SSL_ports > http_access deny to_localhost > acl mynetwork src 192.168.100.0/255.255.255.0 The idea of adding "localnet" to the config was that you place your own network ranges under that name. There is no need for both a "localnet" and a "mynetwork" ACL. you can remove the localnet defaults. > http_access allow mynetwork > http_access allow localnet > http_access allow localhost > http_reply_access allow all > icp_access allow all > visible_hostname myfirewall@xxxxxxxxxxxx The above is not a fully qualified domain name. It should look something like this: myfirewall.mydomain.com and have public rDNS available for people to find your IP and related contacts when things go wrong. > append_domain .homeland.net Squid newer than 2.6 should be pulling this from /etc/resolve.conf properly. I think the latest 2.6 do as well, but am not completely sure of that. > err_html_text admin@xxxxxxxxxxxx > deny_info ERR_CUSTOM_ACCESS_DENIED all > memory_pools off > coredump_dir /var/spool/squid > ie_refresh on > maximum_object_size 800 MB HTH Amos