Re: Re: How to use tcp_outgoing_address with cache_peer

Alex Domoradov <alex.hha@xxxxxxxxx> · Sun, 12 May 2013 22:27:10 +0300

On Wed, May 1, 2013 at 12:42 PM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
> On 1/05/2013 10:21 a.m., babajaga wrote:
>>
>> Amos,
>>
>> although a bit off topic:
>>
>>> It does not work the way you seem to think. 2x 200GB cache_dir entries
>>
>> have just as much space as 1x 400GB. Using two cache_dir allows Squid to
>> balance teh I/O loading on teh disks while simultaenously removing all
>> processing overheads from RAID. <
>>
>> Am I correct in the following:
>> The selection of one of the 2 cache_dirs is not deterministic for same URL
>> at different times, both for round-robin or least load.
>> Which might have the consequence of generating a MISS, although the object
>> ist cached in the other cache_dir.
>> Or, in other words: There is the finite possibility, that a cached object
>> is
>> stored in one cache_dir, and because of the result of the selection algo,
>> when the object should be fetched,
>> the decision to check the wrong cache_dir generates a MISS.
>> In case, this is correct, one 400GB cache would have a higher HIT rate per
>> se.  AND, it would avoid double caching, therefore increasing effectice
>> cache space, resulting in an increase in HIT rate even more.
>>
>> So, having one JBOD  instead of multiple cache_dirs (one cache_dir per
>> disk)
>> would result in better performance, assuming even distribution of (hashed)
>> URLs.
>> Parallel access to the disks in the JBOD is handled on lower level,
>> instead
>> with multiple aufs, so this should not create a real handicap.
>
>
> You are not.
>
> Your whole chain of logic above depends on the storage areas (cache_dir)
> being separate entities. This is a false assumption. They are only separate
> to the operating system. They are merged into a collective "cache" index
> model in Squid memory - a single lookup to this unified store indexing
> system finds the object no matter where it is (disk or local memory) with
> the same HIT/MISS result based on whether it exists *anywhere* in at least
> one of the storage areas.
>
> It takes the same amount of time to search through N index entries for one
> giant cache_dir as it does for the same N index entries for M cache_dir. The
> difference comes when Squid is aware of the individual disk I/O loading and
> sizes it can calculate accurate loading values to optimize read/write
> latency on individual disks.
>
> Amos
>
>
And what would be if we have 2 cache_dir

cache_dir aufs /var/spool/squid/ssd1 200000 16 256
cache_dir aufs /var/spool/squid/ssd2 200000 16 256

/var/spool/squid/ssd1 - /dev/sda
/var/spool/squid/ssd2 - /dev/sdb

User1 download BIG psd file and squid save file on /dev/sda (ssd1).
Then sda is failed and user2 try to download the same file. What would
be in that situation? Does squid download file again and place file on
/dev/sdb and then rebuild "cache" index in memory?