On 5/22/2012 2:29 AM, David Brown wrote: > But in general, it's important to do some real-world testing to > establish whether or not there really is a bottleneck here. It is > counter-productive for Stan (or anyone else) to advise against raid10 or > raid5/6 because of a single-thread bottleneck if it doesn't actually > slow things down in practice. Please reread precisely what I stated earlier: "Neil pointed out quite some time ago that the md RAID 1/5/6/10 code runs as a single kernel thread. Thus when running heavy IO workloads across many rust disks or a few SSDs, the md thread becomes CPU bound, as it can only execute on a single core, just as with any other single thread." Note "heavy IO workloads". The real world testing upon which I based my recommendation is in this previous thread on linux-raid, of which I was a participant. Mark Delfman did the testing which revealed this md RAID thread scalability problem using 4 PCIe enterprise SSDs: http://marc.info/?l=linux-raid&m=131307849530290&w=2 > On the other hand, if it /is/ a hinder to > scaling, then it is important for Neil and other experts to think about > how to change the architecture of md raid to scale better. And More thorough testing and identification of the problem is definitely required. Apparently few people are currently running md RAID 1/5/6/10 across multiple ultra high performance SSDs, people who actually need every single ounce of IOPS out of each device in the array. But this trend will increase. I'd guess those currently building md 1/5/6/10 arrays w/ many SSDs simply don't *need* every ounce of IOPS, or more would be complaining about single core thread limit already. > somewhere in between there can be guidelines to help users - something > like "for an average server, single-threading will saturate raid5 > performance at 8 disks, raid6 performance at 6 disks, and raid10 at 10 > disks, beyond which you should use raid0 or linear striping over two or > more arrays". This isn't feasible due to the myriad possible combinations of hardware. And you simply won't see this problem with SRDs (spinning rust disks) until you have hundreds of them in a single array. It requires over 200 15K SRDs in RAID 10 to generate only 30K random IOPS. Just about any single x86 core can handle that, probably even a 1.6GHz Atom. This issue mainly affects SSD arrays, where even 8 midrange consumer SATA3 SSDs in RAID 10 can generate over 400K IOPS, 200K real and 200K mirror data. > Of course, to do such testing, someone would need a big machine with > lots of disks, which is not otherwise in use! Shouldn't require anything that heavy. I would guess that one should be able to reveal the thread bottleneck with a low freq dual core desktop system with an HBA such as the LSI 9211-8i @320K IOPS, and 8 Sandforce 2200 based SSDs @40K write IOPS each. -- Stan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html