Re: Re: How to use tcp_outgoing_address with cache_peer

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Mon, 13 May 2013 11:03:08 +1200

On 13/05/2013 7:27 a.m., Alex Domoradov wrote:
On Wed, May 1, 2013 at 12:42 PM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
On 1/05/2013 10:21 a.m., babajaga wrote:
Amos,

although a bit off topic:

It does not work the way you seem to think. 2x 200GB cache_dir entries
have just as much space as 1x 400GB. Using two cache_dir allows Squid to
balance teh I/O loading on teh disks while simultaenously removing all
processing overheads from RAID. <

Am I correct in the following:
The selection of one of the 2 cache_dirs is not deterministic for same URL
at different times, both for round-robin or least load.
Which might have the consequence of generating a MISS, although the object
ist cached in the other cache_dir.
Or, in other words: There is the finite possibility, that a cached object
is
stored in one cache_dir, and because of the result of the selection algo,
when the object should be fetched,
the decision to check the wrong cache_dir generates a MISS.
In case, this is correct, one 400GB cache would have a higher HIT rate per
se.  AND, it would avoid double caching, therefore increasing effectice
cache space, resulting in an increase in HIT rate even more.

So, having one JBOD  instead of multiple cache_dirs (one cache_dir per
disk)
would result in better performance, assuming even distribution of (hashed)
URLs.
Parallel access to the disks in the JBOD is handled on lower level,
instead
with multiple aufs, so this should not create a real handicap.

You are not.

Your whole chain of logic above depends on the storage areas (cache_dir)
being separate entities. This is a false assumption. They are only separate
to the operating system. They are merged into a collective "cache" index
model in Squid memory - a single lookup to this unified store indexing
system finds the object no matter where it is (disk or local memory) with
the same HIT/MISS result based on whether it exists *anywhere* in at least
one of the storage areas.

It takes the same amount of time to search through N index entries for one
giant cache_dir as it does for the same N index entries for M cache_dir. The
difference comes when Squid is aware of the individual disk I/O loading and
sizes it can calculate accurate loading values to optimize read/write
latency on individual disks.

Amos

And what would be if we have 2 cache_dir

cache_dir aufs /var/spool/squid/ssd1 200000 16 256
cache_dir aufs /var/spool/squid/ssd2 200000 16 256

/var/spool/squid/ssd1 - /dev/sda
/var/spool/squid/ssd2 - /dev/sdb

User1 download BIG psd file and squid save file on /dev/sda (ssd1).
Then sda is failed and user2 try to download the same file. What would
be in that situation? Does squid download file again and place file on
/dev/sdb and then rebuild "cache" index in memory?

Unfortunately when a UFS cache_dir dies Squid halts. This happens 
whether or not RAID is used. The exception being RAID-1 (but not 
RAID-10) which provides a bit more protection than Squid at present.

With multiple directories though you are in a position to quickly remove 
the dead cache_dir and restart Squid with the second cache_dir while you 
work on a fix, with RAID 0, 10, or 5 you are forced to rebuild the disk 
structure while Squid is either offline or running without *any* disk cache.

Amos