On Wed, May 1, 2013 at 12:42 PM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote: > On 1/05/2013 10:21 a.m., babajaga wrote: >> >> Amos, >> >> although a bit off topic: >> >>> It does not work the way you seem to think. 2x 200GB cache_dir entries >> >> have just as much space as 1x 400GB. Using two cache_dir allows Squid to >> balance teh I/O loading on teh disks while simultaenously removing all >> processing overheads from RAID. < >> >> Am I correct in the following: >> The selection of one of the 2 cache_dirs is not deterministic for same URL >> at different times, both for round-robin or least load. >> Which might have the consequence of generating a MISS, although the object >> ist cached in the other cache_dir. >> Or, in other words: There is the finite possibility, that a cached object >> is >> stored in one cache_dir, and because of the result of the selection algo, >> when the object should be fetched, >> the decision to check the wrong cache_dir generates a MISS. >> In case, this is correct, one 400GB cache would have a higher HIT rate per >> se. AND, it would avoid double caching, therefore increasing effectice >> cache space, resulting in an increase in HIT rate even more. >> >> So, having one JBOD instead of multiple cache_dirs (one cache_dir per >> disk) >> would result in better performance, assuming even distribution of (hashed) >> URLs. >> Parallel access to the disks in the JBOD is handled on lower level, >> instead >> with multiple aufs, so this should not create a real handicap. > > > You are not. > > Your whole chain of logic above depends on the storage areas (cache_dir) > being separate entities. This is a false assumption. They are only separate > to the operating system. They are merged into a collective "cache" index > model in Squid memory - a single lookup to this unified store indexing > system finds the object no matter where it is (disk or local memory) with > the same HIT/MISS result based on whether it exists *anywhere* in at least > one of the storage areas. > > It takes the same amount of time to search through N index entries for one > giant cache_dir as it does for the same N index entries for M cache_dir. The > difference comes when Squid is aware of the individual disk I/O loading and > sizes it can calculate accurate loading values to optimize read/write > latency on individual disks. > > Amos > > And what would be if we have 2 cache_dir cache_dir aufs /var/spool/squid/ssd1 200000 16 256 cache_dir aufs /var/spool/squid/ssd2 200000 16 256 /var/spool/squid/ssd1 - /dev/sda /var/spool/squid/ssd2 - /dev/sdb User1 download BIG psd file and squid save file on /dev/sda (ssd1). Then sda is failed and user2 try to download the same file. What would be in that situation? Does squid download file again and place file on /dev/sdb and then rebuild "cache" index in memory?