Hi,
I am currently facing a problem that I wasn't able to find a solution
for in the mailing list or on the internet,
My squid is dying for 30 seconds every one hour at the same exact
time, squid process will still be running,
I lose my wccp connectivity, the cache peers detect the squid as a
dead sibling, and the squid cannot server any requests
The network connectivity of the sever is not affected (a ping to the
squid's ip doesn't timeout)
The problem doesn't start immediately when the squid is installed on
the server (The server is dedicated as a squid)
It starts when the cache directories starts to fill up,
I have started my setup with 10 cache directors, the squid will start
having the problem when the cache directories are above 50% filled
when i change the number of cache directory (9,8,...) the squid works
for a while then the same problem
cache_dir aufs /cache1/squid 90000 140 256
cache_dir aufs /cache2/squid 90000 140 256
cache_dir aufs /cache3/squid 90000 140 256
cache_dir aufs /cache4/squid 90000 140 256
cache_dir aufs /cache5/squid 90000 140 256
cache_dir aufs /cache6/squid 90000 140 256
cache_dir aufs /cache7/squid 90000 140 256
cache_dir aufs /cache8/squid 90000 140 256
cache_dir aufs /cache9/squid 90000 140 256
cache_dir aufs /cache10/squid 80000 140 256
I have 1 terabyte of storage
Finally I created two cache dircetories (One on each HDD) but the
problem persisted
You have 2 HDD? but, but, you have 10 cache_dir.
We repeatedly say "one cache_dir per disk" or similar. In particular
one cache_dir per physical drive spindle (for "disks" made up of
multiple physical spindles) wherever possible with physical
drives/spindles mounting separately to ensure the pairing. Squid
performs a very unusual pattern of disk I/O which stress them down to
the hardware controller level and make this kind of detail critical
for anything like good speed. Avoiding cache_dir object limitations by
adding more UFS-based dirs to one disk does not improve the situation.
That is a problem which will be affecting your Squid all the time
though, possibly making the source of the pause worse.
From teh description I believe it is garbage collection on the cache
directories. The pauses can be visible when garbage collecting any
caches over a few dozen GB. The squid default "swap_high" and
"swap_low" values are "5" apart, with at minimum being a value of 0
apart. These are whole % points of the total cache size, being erased
from disk in a somewhat random-access style across the cache area. I
did mention uncommon disk I/O patterns, right?
To be sure what it is, you can use the "strace" tool to the squid
worker process (the second PID in current stable Squids) and see what
is running. But given the hourly regularity and past experience with
others on similar cache sizes, I'm almost certain its the garbage
collection.
Amos
Hi Amos,
Thank you for your fast reply,
I have 2 HDD (450GB and 600GB)
df -h displays that i have 357Gb and 505GB available
In my last test, my cache dir where:
cache_swap_low 90
cache_swap_high 95
maximum_object_size 512 MB
maximum_object_size_in_memory 20 KB
cache_dir aufs /cache1/squid 320000 480 256
cache_dir aufs /cache2/squid 480000 700 256
Is this Ok?
Thank you
Elie Merhej