Re: Re: Extremely Low Request Hit Ratio!

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Sat, 10 Aug 2013 13:42:00 +1200

On 10/08/2013 12:24 p.m., Golden Shadow wrote:
Hello babajaga,

Thanks a lot your help!

I understand from what you said that squid does not know in which cache_dir a requested object may be located in (it does not search for that object in all the cache_dir's) and that may result in the same object stored in all cache-dir's resulting in lower hit ratio, did I get it right?

That does seem to be what he is saying. However it is wrong.

Squid maintains an in-memory index about each cache_dir contents. There 
is only ever one location for a response object.

There may be duplicates on disk, but only for the time between a 
replacement being added elsewhere in cache and the old one being deleted 
or overwritten. If Squid is restarted during this time a "DIRTY" disk 
scan will mention the duplicates being detected and removed, a "CLEAN" 
disk scan will ignore them and overwrite the on-disk file when that disk 
space needs to be re-used.

With least-load algorithm new objects and replacements being added to 
cache are sorted into the dir with most available space and least 
in-processing I/O as far as Squid can tell (although with AUFS threading 
that I/O is not very well calculated).

Best regards,
Firas

----- Original Message -----
From: babajaga

My suspicion is some problem here:

store_dir_select_algorithm least-load

# cache_dir aufs /mnt/cachedrive1 1342177 128 512
cache_dir aufs /mnt/cachedrive2 1426227 128 512
cache_dir aufs /mnt/cachedrive3 1426227 128 512
cache_dir aufs /mnt/cachedrive4 1426227 128 512
cache_dir aufs /mnt/cachedrive5 427008 128 256

Not knowing internal processing within squid in this scenario, but
theoritically there MIGHT be the risk, having the same object cached up to 4
times, because all cache_dir have equal storage properties. Which reduces
overall sum of cachable objects significantly. And MIGHT lead to unnecessary
purge of cached objects, in case a cache_dir almost full.

A bigger issue with the cache garbage collection is that the watermark 
limits are in whole %-points of cache_dir size (and the high watermark 
has had some complaints about not working at all - yet to be 
replicated). Which on caches this big it could mean 50GB of data being 
erased from disk "on the hour, every hour.", usually in the form of a 
great many very small files. One huge burst of disk I/O load.

For best performance I also recommend the suggestion below, with a 
caveat about checking what your traffic objects size peaks are so you 
can tune the limits to those ...

So I would either use just one large cache_dir (which might then run into
the limit regarding max. number
of cachable objects in one dir 2**24)
or (better solution)
set disjunct limits on the size of cachable objects for the various
cache_dir,
using
cache_dir aufs /mnt/cachedrive2 1426227 128 512 min-size=xxxx  max-size=xxxx

And move these lines
minimum_object_size 16 KB

A lot of modern Internet traffic is made up of small objects. To the 
point where average object size for some ISP is 10-16KB. Setting your 
lowest cached object size this small will make all of that traffic a 
MISS and will be contributing to the small HIT ratio.

I recommend adding one or more Rock type cache_dir to store these small 
objects.

maximum_object_size 512 MB
on top of this one:
maximum_object_size_in_memory 300 KB

Um. I've yet to see any reason for doing that swap around within the 
limits lines. However the strange bug in the recent releases does 
requires the cache_dir to be placed below the *_object_size_* 
limitations or it picks up whatever those disk limits were at the time 
the cache_dir line is read in from config.

Amos