On 4/7/2012 12:10 PM, Joe Landman wrote: > On 04/07/2012 12:50 PM, Peter Grandi wrote: > >> * Your storage layer does not seem to deliver parallel >> operations: as the ~100MB/s overall 'ext4' speed and the >> seek graphs show, in effect your 4+2 RAID6 performs in this >> case as if it were a single drive with a single arm. > > This is what lept out at me. I retried a very similar test (pulled > Icedtea 2.1, compiled it, tarred it, measured untar on our boxen). I > was getting a fairly consistent 4 +/- delta seconds. That's an interesting point. I guess I'd chalked the low throughput up to high seeks. > 100MB/s on some supposedly fast drives with a RAID card indicates that > either the RAID is badly implemented, the RAID layout is suspect, or > similar. He should be getting closer to N(data disks) * BW(single disk) > for something "close" to a streaming operation. Reading this thread seems to indicate you're onto something Joe: http://h30499.www3.hp.com/t5/System-Administration/Extremely-slow-io-on-cciss-raid6/td-p/4214888 Add this to the mix: "The HP Smart Array P400 is HP's first PCI-Express (PCIe) serial attached SCSI (SAS) RAID controller" That's from: http://h18000.www1.hp.com/products/servers/proliantstorage/arraycontrollers/smartarrayp400/index.html First gen products aren't always duds, but the likelihood is often much higher. Everyone posting to that forum is getting low throughput, and most of them are testing streaming reads/writes, not massively random IO as is Stefan's case. > This isn't suggesting that he didn't hit some bug which happens to over > specify use of ag=0, but he definitely had a weak RAID system (at best). > > If he retries with a more capable system, or one with a saner RAID > layout (16k chunk size? For spinning rust? Seriously? Short stroking > DB layout?), an agcount of 32 or higher, and still sees similar issues, > then I'd be more suspicious of a bug. Or merely a weak/old product. The P400 was an entry level RAID HBA, HP's first PCIe/SAS RAID card. It was discontinued quite some time ago. The use of DDR2/533 memory indicates it's design stage started probably somewhere around 2004, 8 years ago. Now that I've researched the P400, and assuming Stefan currently has the card firmware optimally configured, I'd bet this workload is simply overwhelming the RAID ASIC. To confirm this, simply configure each drive as a RAID0 array, so all 6 drives are exported as block devices. Configure them as an md RAID6 and test the workload. Be sure to change the Linux elevator to noop first since you're using hardware write cache: $ echo deadline > /sys/block/sd[a-e]/queue/scheduler Execute this 6 times, once for each of the 6 drives, changing the device name each time, obviously. This is not a persistent change. The gap between EXT4 and XFS will likely still exist, but overall numbers should jump substantially Northward, if the problem is indeed a slow RAID ASIC. -- Stan _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs