hum, does anyone could explain what a 'multi thread' version of raid1 could be implemented? for example, how to scale it? and why this new implementation could scale it better 2012/5/21 Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>: > On 5/21/2012 10:20 AM, CoolCold wrote: >> On Sat, May 12, 2012 at 2:28 AM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote: >>> On 5/11/2012 3:16 AM, Daniel Pocock wrote: >>> >> [snip] >>> That's the one scenario where I abhor using md raid, as I mentioned. At >>> least, a boot raid 1 pair. Using layered md raid 1 + 0, or 1 + linear >>> is a great solution for many workloads. Ask me why I say raid 1 + 0 >>> instead of raid 10. >> So, I'm asking - why? > > Neil pointed out quite some time ago that the md RAID 1/5/6/10 code runs > as a single kernel thread. Thus when running heavy IO workloads across > many rust disks or a few SSDs, the md thread becomes CPU bound, as it > can only execute on a single core, just as with any other single thread. > > This issue is becoming more relevant as folks move to the latest > generation of server CPUs that trade clock speed for higher core count. > Imagine the surprise of the op who buys a dual socket box with 2x 16 > core AMD Interlagos 2.0GHz CPUs, 256GB RAM, and 32 SSDs in md RAID 10, > only to find he can only get a tiny fraction of the SSD throughput. > Upon investigation he finds a single md thread peaking one core while > the rest are relatively idle but for the application itself. > > As I understand Neil's explanation, the md RAID 0 and linear code don't > run as separate kernel threads, but merely pass offsets to the block > layer, which is fully threaded. Thus, by layering md RAID 0 over md > RAID 1 pairs, the striping load is spread over all cores. Same with > linear, avoiding the single thread bottleneck. > > This layering can be done with any md RAID level, creating RAID50s and > RAID60s, or concatenations of RAID5/6, as well as of RAID 10. > > And it shouldn't take anywhere near 32 modern SSDs to saturate a single > 2GHz core with md RAID 10. It's likely less than 8 SSDs, which yield > ~400K IOPS, but I haven't done verufication testing myself at this point. > > -- > Stan > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html