Search squid archive

Re: Re: Extremely Low Request Hit Ratio!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/08/2013 12:24 p.m., Golden Shadow wrote:
Hello babajaga,

Thanks a lot your help!

I understand from what you said that squid does not know in which cache_dir a requested object may be located in (it does not search for that object in all the cache_dir's) and that may result in the same object stored in all cache-dir's resulting in lower hit ratio, did I get it right?

That does seem to be what he is saying. However it is wrong.

Squid maintains an in-memory index about each cache_dir contents. There is only ever one location for a response object.

There may be duplicates on disk, but only for the time between a replacement being added elsewhere in cache and the old one being deleted or overwritten. If Squid is restarted during this time a "DIRTY" disk scan will mention the duplicates being detected and removed, a "CLEAN" disk scan will ignore them and overwrite the on-disk file when that disk space needs to be re-used.

With least-load algorithm new objects and replacements being added to cache are sorted into the dir with most available space and least in-processing I/O as far as Squid can tell (although with AUFS threading that I/O is not very well calculated).


Best regards,
Firas


----- Original Message -----
From: babajaga

My suspicion is some problem here:

store_dir_select_algorithm least-load

# cache_dir aufs /mnt/cachedrive1 1342177 128 512
cache_dir aufs /mnt/cachedrive2 1426227 128 512
cache_dir aufs /mnt/cachedrive3 1426227 128 512
cache_dir aufs /mnt/cachedrive4 1426227 128 512
cache_dir aufs /mnt/cachedrive5 427008 128 256


Not knowing internal processing within squid in this scenario, but
theoritically there MIGHT be the risk, having the same object cached up to 4
times, because all cache_dir have equal storage properties. Which reduces
overall sum of cachable objects significantly. And MIGHT lead to unnecessary
purge of cached objects, in case a cache_dir almost full.

A bigger issue with the cache garbage collection is that the watermark limits are in whole %-points of cache_dir size (and the high watermark has had some complaints about not working at all - yet to be replicated). Which on caches this big it could mean 50GB of data being erased from disk "on the hour, every hour.", usually in the form of a great many very small files. One huge burst of disk I/O load.

For best performance I also recommend the suggestion below, with a caveat about checking what your traffic objects size peaks are so you can tune the limits to those ...

So I would either use just one large cache_dir (which might then run into
the limit regarding max. number
of cachable objects in one dir 2**24)
or (better solution)
set disjunct limits on the size of cachable objects for the various
cache_dir,
using
cache_dir aufs /mnt/cachedrive2 1426227 128 512 min-size=xxxx  max-size=xxxx

And move these lines
minimum_object_size 16 KB

A lot of modern Internet traffic is made up of small objects. To the point where average object size for some ISP is 10-16KB. Setting your lowest cached object size this small will make all of that traffic a MISS and will be contributing to the small HIT ratio.

I recommend adding one or more Rock type cache_dir to store these small objects.


maximum_object_size 512 MB
on top of this one:
maximum_object_size_in_memory 300 KB

Um. I've yet to see any reason for doing that swap around within the limits lines. However the strange bug in the recent releases does requires the cache_dir to be placed below the *_object_size_* limitations or it picks up whatever those disk limits were at the time the cache_dir line is read in from config.

Amos





[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux