On 11/07/18 22:39, pete dawgg wrote: > Hello list, > > i run squid 3.5.27 with some special settings for windows updates as suggested here: https://wiki.squid-cache.org/ConfigExamples/Caching/WindowsUpdates It's been running almost trouble-free for some time, but for ~2 months the cache-partition has been filling up to 100% (space; inodes were OK) and squid then failed. > That implies that either your cache_dir size accounting is VERY badly broken, something else is filling the disk (eg failing to rotate swap.state journals), or disk purging is not able to keep up with the traffic flow. > the cache-dir is on a 100GB ext2-partition and configured like this: > Hmm, a partition. What else is using the same physical disk? Squid puts such random I/O pattern on cache disks its best not to be using the actual physical drive for other things in parallel - they can slow Squid down, and conversely Squid can cause problems to other uses by flooding the disk controller queues. > cache_dir aufs /mnt/cache/squid 75000 16 256 These numbers do matter for ext2 more than for other FS types. You need them to be large enough not to allocate too many inodes per directory. I would use "64 256" here, or even "128 256" for a bigger safety margin. (I *think* modern ext2 implementations have resolved the core issue, but that may be wrong and ext2 is old enough to be wary.) > cache_swap_low 60 > cache_swap_high 75 > minimum_object_size 0 KB > maximum_object_size 6000 MB If you bumped this for the Win8 sizes mentioned in our wiki, the Win10 major updates have bumped sizes up again past 10GB. So you may need to increase this. > > some special settings for the windows updates: > range_offset_limit 6000 MB Add the ACLs necessary to restrict this to WU traffic. Its really hard on cache space**, so should not be allowed to just any traffic. ** What I mean by that is it may result in N parallel fetches of the entire object unless collapsed forwarding feature is used. In regards to your situation; consider a 10GB WU object being fetched 10 times -> 10*10 GB of disk space required just to fetch. Which over-fills your available 45GB (60% of 75000 MB [cache_swap_low/100 * cache_dir] ). And 11 will overflow your whole disk. > maximum_object_size 6000 MB > quick_abort_min -1 > quick_abort_max -1 > quick_abort_pct -1 > > when i restart squid with its initscript it sometimes expunges some stuff from the cache but then fails again after a short while: > before restart: > /dev/sdb2 99G 93G 863M 100% /mnt/cache > after restart: > /dev/sdb2 99G 87G 7,4G 93% /mnt/cache > How much of that /mnt/cache size is in /mnt/cache/squid ? Is it one physical HDD spindle (versus a RAID drive) ? > > there are two types of errors in cache.log: > FATAL: Ipc::Mem::Segment::open failed to shm_open(/squid-cf__metadata.shm): (2) No such file or directory The cf__metadata.shm error is quite bad - it means your collapsed forwarding is now working well. Which implies it is not preventing the disk overflow on parallel huge WU fetches. Are you able to try the new Squid-4? there are some collapsed forwarding and cache management changes that may fix or allow better diagnosis of these particularly and maybe your disk usage problem. > FATAL: Failed to rename log file /mnt/cache/squid/swap.state.new to /mnt/cache/squid/swap.state This is suspicious, how large are those swap files? Does your proxy have correct access permissions on them and the directories in their path - both Unix filesystem and SELinux / AppArmour / whatever your system uses for advanced access matter here. Same things to check for the /dev/shm device and *.shm file access error above. But /dev/shm should be root things rather than Squid user access. > > What should i do to make squid work with windows updates reliably again? Some other things you can check; You can try to make the cache_swap_high/low be closer together and much larger (eg the default 90 and 95 values). Current 3.5 have fixed the bug which made smaller values necessary on some earlier installs. If you can afford the delays it introduces to restart, you could run a full scan of the cached data (stop Squid, delete the swap.state* files, then restart Squid and wait). - you could do that with a copy of Squid not handling user traffic if necessary, but the running one cannot use the cache while its happening. Otherwise, have you tried purging the entire cache and starting Squid with a clean slate? that would be a lot faster for recovery than the above scan. But does have a bit more bandwidth spent short-term while re-filling the cache. Amos _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users