One should also consider the difference between
simple RAID and extremely advanced RAID disk systems
(i.e. EMC and other arrays).
The external disk arrays like EMC with internal RAID5 are simply faster
than a JBOD of internal disks.
How many write-cycles does EMC use to backup data after one
system-used write cycle?
How may CPU cycles does EMC spend figuring out which disk the
file-slice is located on, _after_ squid has already hashed the file
location to figure out which disk the file is located on?
Regardless of speed, unless you can provide a RAID system which has
less than one hardware disk-io read/write per system disk-io
read/write you hit these theoretical limits.
I can't quote disk cycle numbers, but I know that our fiber-connected HP
EVA8000's (with ginormous caches and LUNs spread over 72 spindles, even
at RAID5) are one hell of a lot faster than the local disks. The 2 Gbps
fiber connection is the limiting factor for most of our high-bandwidth
apps. In our shop, squid is pretty low bandwidth by comparison. We
normally hover around 100 req/sec with occasional peaks at 200 req/sec.
But its not so much a problem of human-noticable absolute-time as a
problem of underlying duplicated disk-io-cycles and
processor-io-cycles and processor delays remains.
For now the CPU half of the problem gets masked by the
single-threadedness of squid (never though you'd see that being a
major benefit eh?). If squid begins using all the CPU threads the OS
will loose out on its spare CPU cycles on dual-core machines and RAID
may become a noticable problem there.
Your arguments are valid for software RAID, but not for hardware RAID.
Most nicer systems have a dedicated disk controller with its own
processor that handles nothing but the onboard RAID. A fiber-connected
disk array is conceptually similar, but with more horsepower. The CPU
never has to worry about overhead in this case. Perhaps for these
scenarios, squid could use a config flag that tells it to put everything
on one "disk" (as it sees it) and not bother imposing any of its own
overhead for operations that will already be done by the array controller.
begin:vcard
fn:Ben Hollingsworth
n:Hollingsworth;Ben
org:BryanLGH Health System;Information Technology
adr:;;1600 S. 48th St.;Lincoln;NE;68506;USA
email;internet:ben.hollingsworth@xxxxxxxxxxxx
title:Systems Programmer
tel;work:402-481-8582
tel;fax:402-481-8354
tel;cell:402-432-5334
url:http://www.bryanlgh.org
version:2.1
end:vcard