On 1/20/2013 9:19 PM, Linda Walsh wrote: > 2) Downloaded+online media+SW. RAID5: 4-data spindles using 2tB(1.819TB) > Hitachi Ultrastar 7.2K SATA's (note, the disks in #3 & #4 are the same > type). > > 3) Main data+devel disk: RAID50 12-data spindles in 3 groups of 4. > NOTE: I tried > and benched RAID60 but wasn't happy with the performance, The 2108 ROC ASIC in the 9280 doesn't have sufficient horsepower for good performance with dual parity arrays, but that pales in comparison to the performance drop due to the RMW induced seek latency. > not to mention > the diskspace > hit RAID10 would be a bit too decadent for my usage/budget. When one perceives the capacity overhead of RAID1/10 as an intolerable cost, instead of a benefit, one is forever destined to suffer from the poor performance of parity RAID schemes. ... > On #3 currently using 12.31tB in 20 partitions ... > Note -- I generally Like the RAID50's ... > Cards, 1 internal: Dell Perc 6/i (serving #1 & #2 above -- all internal) > 1 LSI MR9280DE-8e (serving #3+4) > 2 Enclosures LSI-DE1600-SAS (12x3.5" ea) So you have 24x 2TB 7.2K SATA drives total in two 630Js, correct? > 3 years from now? Ha!. Lets just say that with the dollar dropping > as fast as disk prices over the past 4 years has flamboozled any normal > planning. I feel ya. > I was mostly interested in how increasing number of spindles > in a Raid50 would help parallelism. ... > My thoughts on that > was that since each member of a RAID0, can be read or written independently > of any other member (as there is no parity to check), that IF I wanted to > increase parallelism (while hurting maximum throughput AND disk space), I > **could** reconfigure to .. well extreme would be 5 groups of 2-data/3disk > RAID5's. That would, I think, theoretically (and if the controller is > up to > it, which I think it is), allow *up_to* 5 separate reads/writes to be > served > in parallel, vs. now, I think it should be 3. ... > It was, I thought, a fairly simply question, but I have a history > of sometimes thinking things will be easier than they are proportional to > how far away (in future or someone else doing it! ;-)) something is... The answer is simple too: Parity RAID sucks. If you want anything more than a trivial increase in performance, you need to ditch parity RAID. Given the time and effort involved in rearranging all of your disks to get one or two more RAID5 arrays with fewer disks per array into a RAID50, it doesn't make sense to do so when you can simply create one large RAID10, and be done monkeying around and second guessing. You'll have the performance you're seeking. Actually far, far more. > My **GENERAL** plan if prices had cooperated was to move > to 3TB SATA's and **mabye** a 3rd enclosure -- I sorta like the LSI ones.. Using 3TB (or larger) drives simply increases the probability of losing an entire RAID5 array if you ever have to rebuild a dead drive. See the linux-raid archives for the past few days for a good discussion on the topic. LSI sold the Engenio division to NetApp in early 2011, only a ~year after they started selling the 620/630 in the channel--very short product life. The DE1600 (LSI 630J) 12 bay enclosure you have is still available from NetApp but I'd doubt you want to pay NetApp's price, if they'd even sell it bare without drives, or without a mandatory service contract. > they seem pretty solid. Have tried a few others and generally found them > not as good, but have looked on the economical side since this is for > a home office^h^h^h^h^h^hlab^h^h^hplay setup.... Norco's units have a decent rep, especially given the price point. They use LSI's SAS expander ASIC: http://www.newegg.com/Product/Product.aspx?Item=N82E16816133047 No dual hot swap PSUs or expander module slots compared to the LSI, but probably 2:1 lower price. I didn't list the 12 bay unit as it's only $230 less, $1369 vs $1139. > Consider this -- my max read and write (both), on my > large array is 1GB/s. There's no way I could get that with a RAID10 setup > without a much larger number of disks. On the contrary. The same disks in RAID10 will walk all over your RAID50 setup. Let's discuss practical use and performance instead of peak optimums shall we? Note that immediately below I'm simply educating you, not recommending a 12 drive RAID10. Recommendations come later. In this one array you have 12 drives, 3x 4 drive RAID5 arrays in RAID50, for 9 effective data spindles. An equivalent 12 drive RAID10 would yield 6 data spindles. For a pure streaming read workload with all drives evenly in play, the RAID50 might be ~50% faster. For a purely random read workload about the same, although in both cases 50x or more slower than the streaming read case due to random seeks. With a pure streaming allocation write workload with perfect stripe filling, no RMW, the RAID50 will be faster, but less than the 50% above due to parity calcs in the ASIC. Now it gets interesting. With a purely random write non aligned non allocation workload on the RAID50, RMW cycles will abound driving seek latency through the roof, while the ASIC is performing a parity calc on each stripe update. Throughput here will be in the low tens of MBs per second, tops. RAID10 simply writes each sector--done. Throughput will be in the high tens to 100s of MB/s. So in this scenario RAID10 will be anywhere from 5-10x or more faster depending on the distribution of the writes across the drives. Another factor here is that RMW reads from the disks go into the LSI cache for parity recalculation, eating cache bandwidth and capacity, decreasing the writeback efficiency. With RAID10 you get full cache bandwidth for sinking incoming writes and performing flush scheduling, both being extremely important for random write workloads. Food for thought: A random write workload of ~500MB with RAID10 will complete almost instantly after the controller cache consumes it. With RAID50 you have to go through the hundreds or thousands of RMW cycles on the disks, so the same operation will take many minutes. Lets look at more real world scenarios. Take your example of the nightly background processes kicking in. This constitutes a mixed random read and write workload. In this situation every RMW can create 3 seeks per drive write: read, write, parity write. Now you have a seek for a pending read operation, making 4 seeks. But the problem isn't just the seeks, it is the inter-seek latency due to the slow 7.2K RPM platters having to spin under the head for the next read or write. This scenario makes scheduling in the controller and the drives themselves very difficult adding more latency. With RAID10 in this scenario, you simply have write/read/write/read/etc. You're not performing 2 extra seeks for each write, so you're not incurring that latency between operations, nor the scheduling complexity, thus driving throughput much higher. In this scenario, the 6 disk RAID10 may be 10x to as much as 50x faster than the RAID50 depending on the access/seek patterns. I've obviously not covered this in much technical detail as storage behavior is quite complex. I've attempted to give you a high level overview of the behavioral differences between parity and non parity RAID, and the potential performance differences with various workloads, and the differences between "peak" performance and actual performance. While your RAID50 may have greater theoretical peak streaming performance, the RAID10 will typically, literally, run circles around it with most day-to-day mixed IO workloads. While the RAID50 may have a peak throughput of ~1GB/s, it may only attain that 1-10% of the time. The RAID10 may have a peak throughput of "only" ~700MB/s, but may likely achieve that more than 60% of the time. And as a result its performance degradation will be much more graceful with concurrent workloads due the the dramatically lower IO completion latencies. > Though I admit, concurrency would > rise... but I generate most of my workload, so usually I don't have > too many things going on at the same time... a few maybe... But I'd guess it's at times like this when you bog down the RAID50 with mixed workloads and become annoyed. You typically don't see that with the non-parity arrays. > When an xfs_fsr kicks in and starts swallowing disk-cache, *ahem*, > and the daily backup kicks in, AND the daily 'rsync' to create a static > snapshot... things can slow down a bit.. but rare am I up at those hours... And this is one scenario where the RAID10 would run circles around the RAID50. > The most intensive is the xfs_fsr, partly due to it swallowing > up disk cache (it runs at nice -19 ionice -c3, and I can still feel it!)... As Dave and possibly other devs have stated, cron'ing xfs_fsr is not recommended. While it defrags your files it fragments free space. Fragmented free space tends to kill performance more than file fragmentation. It also puts extra wear & tear on your drives, especially when using parity RAID due to the things mentioned above. > I might play more with putting it in it's own blkio cgroup. > and just limiting the overall disk transactions...(not to mention > fixing that disk-buffer usage issue)... On the off chance that xfs_fsr has completion timers or some such, I'd ask around before doing that. Sufficiently limiting its IO rate may have unintended consequences. >> You'll need more drives to maintain the same usable capacity, > --- > > (oh, a minor detail! ;^))... Well how much space do you really need in a one person development operation plus home media/etc storage system? 10TB, 24TB, 48TB? Assuming you have both 630Js filled with 24x 2TB drives, that's 48TB raw. If you have 6x 4 drive RAID5s in multiple RAID50 spans, you have 18x 2TB = 36TB of capacity. Your largest array is 12 drives with 9 effective spindles of throughput. You've split up your arrays for different functions, limiting some workloads to fewer spindles of performance, and having spindles sit idle that could otherwise be actively adding performance to active workloads. You've created partitions directly on the array disk devices and have various LVM devices and filesystems on those for various purposes, again limiting some filesystems to less performance than your total spindles can give you. The change I recommend you consider is to do something similar to what we do with SAN storage consolidation. Create a single large spindle count non-parity array on the LSI. In this case that would be a 24 drive RAID10 with a strip (sunit) of 32KB, yielding a stripe width (swidth) of 384KB, which should work very well with all of your filesystems and workloads, giving a good probability of full stripe writes. You'd have ~24TB of usable space. All of your workloads would have 12 spindles of non-parity performance, peak streaming read/write of ~1.4GB/s, and random read/write mixed workload throughput of a few hundred MB/s, simply stomping what you have now. You'd be very hard pressed to bog down this 12 spindle non-parity array. Making a conservative guesstimate, I'd say the mixed random IO throughput would be on the order of 30x-50x that of your current RAID5/50 arrays combined. In summary, you'd gain a staggering performance increase you simply wouldn't have considered possible with your current hardware. You'd "sacrifice" 12TB of your 48TB of raw space to achieve it. That 30-50x increase in random IOPs is exactly why many folks gladly "waste money on extra drives". After you see the dramatic performance increase you'll wonder why you ever considered spending money on high RPM SAS drives to reduce RAID5 latency. Put these 24 7.2K SATA drives in this RAID10 up against 24 15K SAS drives in a 6x4 RAID50. Your big slow Hitachis will best the nimble SAS 15ks in random IOPS, probably by a wide margin. Simply due to RMW. Yes, RMW will hammer 15K drives that much. RMW hammers all spinning rust, everything but SSDs. > Don't spend much time on this.. (well if you read it, that might be too > much > already! ;-))... As I said it's not THAT important...and was mostly about > the effect of groups in a RAID50 relating to performance tradeoffs. Optimizing the spindle count of constituent RAID5s in a RAID50 to gain performance is akin to a downhill skier manically waxing his skis every day, hoping to shave 2 seconds off a 2 minute course. > Thanks for any insights...(I'm always open to learning how wrong I am! > ;-))... If nothing else I hopefully got the point across as to how destructive parity RAID read-modify-write operations are to performance. It's simply impossible to get good mixed IO performance from parity RAID unless one's workloads always fit in controller write cache, or if one has SSD storage. -- Stan _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs