Re: Re: How to use tcp_outgoing_address with cache_peer

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Wed, 01 May 2013 21:42:41 +1200

On 1/05/2013 10:21 a.m., babajaga wrote:
Amos,

although a bit off topic:

It does not work the way you seem to think. 2x 200GB cache_dir entries
have just as much space as 1x 400GB. Using two cache_dir allows Squid to
balance teh I/O loading on teh disks while simultaenously removing all
processing overheads from RAID. <

Am I correct in the following:
The selection of one of the 2 cache_dirs is not deterministic for same URL
at different times, both for round-robin or least load.
Which might have the consequence of generating a MISS, although the object
ist cached in the other cache_dir.
Or, in other words: There is the finite possibility, that a cached object is
stored in one cache_dir, and because of the result of the selection algo,
when the object should be fetched,
the decision to check the wrong cache_dir generates a MISS.
In case, this is correct, one 400GB cache would have a higher HIT rate per
se.  AND, it would avoid double caching, therefore increasing effectice
cache space, resulting in an increase in HIT rate even more.

So, having one JBOD  instead of multiple cache_dirs (one cache_dir per disk)
would result in better performance, assuming even distribution of (hashed)
URLs.
Parallel access to the disks in the JBOD is handled on lower level, instead
with multiple aufs, so this should not create a real handicap.

You are not.

Your whole chain of logic above depends on the storage areas (cache_dir) 
being separate entities. This is a false assumption. They are only 
separate to the operating system. They are merged into a collective 
"cache" index model in Squid memory - a single lookup to this unified 
store indexing system finds the object no matter where it is (disk or 
local memory) with the same HIT/MISS result based on whether it exists 
*anywhere* in at least one of the storage areas.

It takes the same amount of time to search through N index entries for 
one giant cache_dir as it does for the same N index entries for M 
cache_dir. The difference comes when Squid is aware of the individual 
disk I/O loading and sizes it can calculate accurate loading values to 
optimize read/write latency on individual disks.

Amos