Search squid archive

Re: RAID is good

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




The point of why I started the discussion is that the statement in the wiki
"Do not use RAID under any circumstances" is at least outdated.

Most companies will trade in performance for reliability because they depend
on internet access for their business and cannot afford to have 2-48 hours
of unavailability.

Everybody knows that EMC and HP systems are much more expensive than
a JBOD but this is not a valid reason to say "Never use RAID".
"Never use RAID" implies that RAID is *BAD* which is simply not true.

From my point of view, the wiki should say something like:

If you want cheapest, modest performance, with no availability guarantees use JBOD.
If you want cheap, modest performance and availability use RAID1/RIAD5 without
a sophisticated disk array (preferably with a RAID card that has
128+ MB battery-backed write cache).
If you want cheapest availability use RAID5 without a sophisticated disk array
If you want expensive extreme performance and availability use a sophisticated disk array.

-Marcus


Adrian Chadd wrote:
And I'd completely agree with you; because you're comparing $EXPENSIVE
attached storage (that generally is run as RAID) to $NOT_SO_EXPENSIVE
local storage which doesn't have .. well, all the fruit.

The EMC disk arrays, when treated as JBOD's, won't be faster. They're faster
because you're rolling massive caches on top of RAID5+striping, or RAID1+0,
etc.

The trouble is this - none of us have access to high end storage kit,
so developing solutions that'll work there is just not going to happen.

I've just acquired a 14 disk compaq storageworks array, so at least I have
$MANY disks to benchmark against, but its still effectively direct attach
JBOD rather than hardware RAID.

Want this fixed? Partner with someone who can; or do the benchmarks yourself
and publish some results. My experience with hardware RAID5 cards attached
to disk arrays (ie, read -not- intelligent disk shelves like EMC, etc)
is that RAID5 is somewhat slower for the Squid IO patterns. I'd repeat that,
but I don't have a U320-enabled RAID5 card here to talk to this shelf.




Adrian

On Tue, Mar 25, 2008, Ben Hollingsworth wrote:
One should also consider the difference between
simple RAID and extremely advanced RAID disk systems
(i.e. EMC and other arrays).
The external disk arrays like EMC with internal RAID5 are simply faster
than a JBOD of internal disks.
How many write-cycles does EMC use to backup data after one system-used write cycle? How may CPU cycles does EMC spend figuring out which disk the file-slice is located on, _after_ squid has already hashed the file location to figure out which disk the file is located on?

Regardless of speed, unless you can provide a RAID system which has less than one hardware disk-io read/write per system disk-io read/write you hit these theoretical limits.
I can't quote disk cycle numbers, but I know that our fiber-connected HP EVA8000's (with ginormous caches and LUNs spread over 72 spindles, even at RAID5) are one hell of a lot faster than the local disks. The 2 Gbps fiber connection is the limiting factor for most of our high-bandwidth apps. In our shop, squid is pretty low bandwidth by comparison. We normally hover around 100 req/sec with occasional peaks at 200 req/sec.

But its not so much a problem of human-noticable absolute-time as a problem of underlying duplicated disk-io-cycles and processor-io-cycles and processor delays remains.

For now the CPU half of the problem gets masked by the single-threadedness of squid (never though you'd see that being a major benefit eh?). If squid begins using all the CPU threads the OS will loose out on its spare CPU cycles on dual-core machines and RAID may become a noticable problem there.
Your arguments are valid for software RAID, but not for hardware RAID. Most nicer systems have a dedicated disk controller with its own processor that handles nothing but the onboard RAID. A fiber-connected disk array is conceptually similar, but with more horsepower. The CPU never has to worry about overhead in this case. Perhaps for these scenarios, squid could use a config flag that tells it to put everything on one "disk" (as it sees it) and not bother imposing any of its own overhead for operations that will already be done by the array controller.


begin:vcard
fn:Ben Hollingsworth
n:Hollingsworth;Ben
org:BryanLGH Health System;Information Technology
adr:;;1600 S. 48th St.;Lincoln;NE;68506;USA
email;internet:ben.hollingsworth@xxxxxxxxxxxx
title:Systems Programmer
tel;work:402-481-8582
tel;fax:402-481-8354
tel;cell:402-432-5334
url:http://www.bryanlgh.org
version:2.1
end:vcard




[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux