On 05 Mar 08, at 1549, Simon Matter wrote: >> On Tue, 4 Mar 2008, Ian G Batten wrote: >> >>> software RAID5 is a performance >>> disaster area at the best of times unless it can take advantage of >>> intimate knowledge of the intent log in the filesystem (RAID-Z does >>> this), >> >> actually, unless you have top-notch hardware raid controllers, >> software >> raid 5 >> > I can only second that. I'm still wondering what "top-notch hardware > raid > controllers" are. From my experience the only decent "controllers" > you can > get are those in the heavy priced SAN equipments with gigs of cache > on the > SAN controllers and tens or hundreds of spindles behind it. Sorry, that's what I was comparing it to: my experience of software RAID5 is horrid (5+1 assemblages on various small Suns with disksuite) and I probably live a life of luxury with hardware RAID (100-spindle Pillar with 24G of RAM, assorted 50--100 spindle EMC and DotHill arrays with several GB of RAM). I've rarely used PCI-slot RAID controllers: thinking back, I used PCI-card controllers indirectly once upon a time --- Auspex used commodity RAID controllers in their later, doomed, non-VME machines --- and they were horrid. But I use software RAID 0+1 all the time, both with Solaris Disksuite (or whatever it's called this week) and ZFS. Our Cyrus meta-data partitions, for example, sit in this zpool: NAME STATE READ WRITE CKSUM onboard ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0s4 ONLINE 0 0 0 c1t0d0s4 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t1d0s4 ONLINE 0 0 0 c1t1d0s4 ONLINE 0 0 0 and are perfectly happy. The message store comes in over NAS from a 20-disk stripe consisting of 4 5+1 RAID5 assemblages spread over four RAID controllers fronted with ~10GB of RAM cache, however... Returning to the topic at hand, though, I can't for the life of me see why anyone would want to use RAID5 in 2008 _without_ tens or hundreds of spindles and gigs of cache. Why not just use RAID 0+1? When I've got ~40TB in my Pillar, the difference between RAID5 and RAID 0+1 is a large chunk of change: it's the difference between 104 500GB spindles (16 5+1 volumes, 8 hot spares) and 160 500GB spindles plus however much hot-sparing is prudent. An extra sixty spindles, plus the space, controllers, power supplies, cabling, metalwork is a non-trivial amount of money and heat. Likewise in the 96-spindle DotHill stack and the ~80-spindle EMC: those would require dozens of extra disks and a lot of electronics. And so long as you can handle burst-writes inside the cache memory, there's little read and no write performance benefit for going from RAID5 to RAID 0+1: in both cases reads are serviced from a large stripe and writes go to a write-back cache. Massively sustained writes may benefit from 0+1 because it is easier to do than 5, but outside video editing and ultra-high-end VTL that's a rare workload. But at the low end? Why piss about with something as complex, messy and error-prone as RAID5 when RAID 0+1 is going to cost you a couple of extra spindles and save you a RAID controller? If you have four SATA ports on your machine, just put four 500GB SATA spindles on and you have 1TB of 0+1. Use ZFS and you can turn on compression if you want, too, which is fast enough to be worth the saving in spindle access relative to the CPU load for a lot of workloads (I have ~10TB under ZFS compression for replication, and another couple of TB for tape staging). RAID5 is worthwhile to reduce 160 disks to 100; it's not worth it to reduce 4 disks to 3. ian ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html