On 10/08/2013 12:24 p.m., Golden Shadow wrote:
Hello babajaga,
Thanks a lot your help!
I understand from what you said that squid does not know in which cache_dir a requested object may be located in (it does not search for that object in all the cache_dir's) and that may result in the same object stored in all cache-dir's resulting in lower hit ratio, did I get it right?
That does seem to be what he is saying. However it is wrong.
Squid maintains an in-memory index about each cache_dir contents. There
is only ever one location for a response object.
There may be duplicates on disk, but only for the time between a
replacement being added elsewhere in cache and the old one being deleted
or overwritten. If Squid is restarted during this time a "DIRTY" disk
scan will mention the duplicates being detected and removed, a "CLEAN"
disk scan will ignore them and overwrite the on-disk file when that disk
space needs to be re-used.
With least-load algorithm new objects and replacements being added to
cache are sorted into the dir with most available space and least
in-processing I/O as far as Squid can tell (although with AUFS threading
that I/O is not very well calculated).
Best regards,
Firas
----- Original Message -----
From: babajaga
My suspicion is some problem here:
store_dir_select_algorithm least-load
# cache_dir aufs /mnt/cachedrive1 1342177 128 512
cache_dir aufs /mnt/cachedrive2 1426227 128 512
cache_dir aufs /mnt/cachedrive3 1426227 128 512
cache_dir aufs /mnt/cachedrive4 1426227 128 512
cache_dir aufs /mnt/cachedrive5 427008 128 256
Not knowing internal processing within squid in this scenario, but
theoritically there MIGHT be the risk, having the same object cached up to 4
times, because all cache_dir have equal storage properties. Which reduces
overall sum of cachable objects significantly. And MIGHT lead to unnecessary
purge of cached objects, in case a cache_dir almost full.
A bigger issue with the cache garbage collection is that the watermark
limits are in whole %-points of cache_dir size (and the high watermark
has had some complaints about not working at all - yet to be
replicated). Which on caches this big it could mean 50GB of data being
erased from disk "on the hour, every hour.", usually in the form of a
great many very small files. One huge burst of disk I/O load.
For best performance I also recommend the suggestion below, with a
caveat about checking what your traffic objects size peaks are so you
can tune the limits to those ...
So I would either use just one large cache_dir (which might then run into
the limit regarding max. number
of cachable objects in one dir 2**24)
or (better solution)
set disjunct limits on the size of cachable objects for the various
cache_dir,
using
cache_dir aufs /mnt/cachedrive2 1426227 128 512 min-size=xxxx max-size=xxxx
And move these lines
minimum_object_size 16 KB
A lot of modern Internet traffic is made up of small objects. To the
point where average object size for some ISP is 10-16KB. Setting your
lowest cached object size this small will make all of that traffic a
MISS and will be contributing to the small HIT ratio.
I recommend adding one or more Rock type cache_dir to store these small
objects.
maximum_object_size 512 MB
on top of this one:
maximum_object_size_in_memory 300 KB
Um. I've yet to see any reason for doing that swap around within the
limits lines. However the strange bug in the recent releases does
requires the cache_dir to be placed below the *_object_size_*
limitations or it picks up whatever those disk limits were at the time
the cache_dir line is read in from config.
Amos