On Wed, 5 Mar 2008, Ian G Batten wrote: > On 05 Mar 08, at 1549, Simon Matter wrote: > >>> On Tue, 4 Mar 2008, Ian G Batten wrote: >>> >>>> software RAID5 is a performance >>>> disaster area at the best of times unless it can take advantage of >>>> intimate knowledge of the intent log in the filesystem (RAID-Z does >>>> this), >>> >>> actually, unless you have top-notch hardware raid controllers, software >>> raid 5 >>> >> I can only second that. I'm still wondering what "top-notch hardware raid >> controllers" are. From my experience the only decent "controllers" you can >> get are those in the heavy priced SAN equipments with gigs of cache on the >> SAN controllers and tens or hundreds of spindles behind it. > > Sorry, that's what I was comparing it to: my experience of software RAID5 is > horrid (5+1 assemblages on various small Suns with disksuite) and I probably > live a life of luxury with hardware RAID (100-spindle Pillar with 24G of RAM, > assorted 50--100 spindle EMC and DotHill arrays with several GB of RAM). > I've rarely used PCI-slot RAID controllers: thinking back, I used PCI-card > controllers indirectly once upon a time --- Auspex used commodity RAID > controllers in their later, doomed, non-VME machines --- and they were > horrid. > > But I use software RAID 0+1 all the time, both with Solaris Disksuite (or > whatever it's called this week) and ZFS. Our Cyrus meta-data partitions, for > example, sit in this zpool: > > NAME STATE READ WRITE CKSUM > onboard ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t0d0s4 ONLINE 0 0 0 > c1t0d0s4 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t1d0s4 ONLINE 0 0 0 > c1t1d0s4 ONLINE 0 0 0 > > > and are perfectly happy. The message store comes in over NAS from a 20-disk > stripe consisting of 4 5+1 RAID5 assemblages spread over four RAID > controllers fronted with ~10GB of RAM cache, however... > > Returning to the topic at hand, though, I can't for the life of me see why > anyone would want to use RAID5 in 2008 _without_ tens or hundreds of spindles > and gigs of cache. Why not just use RAID 0+1? a couple of reasons becouse raid 0+1 can have you loose everything if you loose the wrong two disks, raid 6 allows you to loose any two disks and keep going. becouse raid 5 only needs one extra drive, and raid 6 only needs 2 extra drives, while raid 0+1 needs 2x drives. there are physical limits to what can fit in a case that can make this a factor (completely ignoring power limits) not everyone needs the fastest performance, most people are making a tradeoff between performance/cost/space, and as a result there are many options that are reasonable in different environments. > When I've got ~40TB in my Pillar, the difference between RAID5 and RAID 0+1 > is a large chunk of change: it's the difference between 104 500GB spindles > (16 5+1 volumes, 8 hot spares) and 160 500GB spindles plus however much > hot-sparing is prudent. An extra sixty spindles, plus the space, > controllers, power supplies, cabling, metalwork is a non-trivial amount of > money and heat. Likewise in the 96-spindle DotHill stack and the ~80-spindle > EMC: those would require dozens of extra disks and a lot of electronics. > > And so long as you can handle burst-writes inside the cache memory, there's > little read and no write performance benefit for going from RAID5 to RAID > 0+1: in both cases reads are serviced from a large stripe and writes go to a > write-back cache. Massively sustained writes may benefit from 0+1 because it > is easier to do than 5, but outside video editing and ultra-high-end VTL > that's a rare workload. you've just outlined good reasons to use raid 5 (or 6) smaller budgets are sensitive to the same issues on smaller arrays. > But at the low end? Why piss about with something as complex, messy and > error-prone as RAID5 when RAID 0+1 is going to cost you a couple of extra > spindles and save you a RAID controller? If you have four SATA ports on your > machine, just put four 500GB SATA spindles on and you have 1TB of 0+1. Use > ZFS and you can turn on compression if you want, too, which is fast enough to > be worth the saving in spindle access relative to the CPU load for a lot of > workloads (I have ~10TB under ZFS compression for replication, and another > couple of TB for tape staging). RAID5 is worthwhile to reduce 160 disks to > 100; it's not worth it to reduce 4 disks to 3. ZFS is not available everywhere, and it is not suitable for all workloads (specificly database type workloads, which is a fair approximation for cyrus) you say it's not worth reducing 4 disks to 3, but what about 6 disks to 4? (useing your example of a machine with 4 SATA drives it's the difference between useing the machine you have or buying a new one) if that's not enough, what about 8 disks to 5? (6 if you do raid 6 or want a hot-spare) what is the point that you would consider the difference valid? David Lang ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html