Re: md RAID with enterprise-class SATA or SAS drives

Roberto Spadim <roberto@xxxxxxxxxxxxx> · Wed, 23 May 2012 10:27:28 -0300

just to understand... i didn't think about a implementation yet...

what could be done to 'multi thread' md raid1,10,5,6?

i didn't understand why it is a problem, i think that the only cpu
time that it need is the time to tell what disk and what position must
be read for each i/o request

i'm just thinking about the normal read/write without resync, check,
bad read/write, or another management feature running

2012/5/23 Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>:
> On 5/22/2012 2:29 AM, David Brown wrote:
>
>> But in general, it's important to do some real-world testing to
>> establish whether or not there really is a bottleneck here.  It is
>> counter-productive for Stan (or anyone else) to advise against raid10 or
>> raid5/6 because of a single-thread bottleneck if it doesn't actually
>> slow things down in practice.
>
> Please reread precisely what I stated earlier:
>
> "Neil pointed out quite some time ago that the md RAID 1/5/6/10 code
> runs as a single kernel thread.  Thus when running heavy IO workloads
> across many rust disks or a few SSDs, the md thread becomes CPU bound,
> as it can only execute on a single core, just as with any other single
> thread."
>
> Note "heavy IO workloads".  The real world testing upon which I based my
> recommendation is in this previous thread on linux-raid, of which I was
> a participant.
>
> Mark Delfman did the testing which revealed this md RAID thread
> scalability problem using 4 PCIe enterprise SSDs:
>
> http://marc.info/?l=linux-raid&m=131307849530290&w=2
>
>> On the other hand, if it /is/ a hinder to
>> scaling, then it is important for Neil and other experts to think about
>> how to change the architecture of md raid to scale better.  And
>
> More thorough testing and identification of the problem is definitely
> required.  Apparently few people are currently running md RAID 1/5/6/10
> across multiple ultra high performance SSDs, people who actually need
> every single ounce of IOPS out of each device in the array.  But this
> trend will increase.  I'd guess those currently building md 1/5/6/10
> arrays w/ many SSDs simply don't *need* every ounce of IOPS, or more
> would be complaining about single core thread limit already.
>
>> somewhere in between there can be guidelines to help users - something
>> like "for an average server, single-threading will saturate raid5
>> performance at 8 disks, raid6 performance at 6 disks, and raid10 at 10
>> disks, beyond which you should use raid0 or linear striping over two or
>> more arrays".
>
> This isn't feasible due to the myriad possible combinations of hardware.
>  And you simply won't see this problem with SRDs (spinning rust disks)
> until you have hundreds of them in a single array.  It requires over 200
> 15K SRDs in RAID 10 to generate only 30K random IOPS.  Just about any
> single x86 core can handle that, probably even a 1.6GHz Atom.  This
> issue mainly affects SSD arrays, where even 8 midrange consumer SATA3
> SSDs in RAID 10 can generate over 400K IOPS, 200K real and 200K mirror data.
>
>> Of course, to do such testing, someone would need a big machine with
>> lots of disks, which is not otherwise in use!
>
> Shouldn't require anything that heavy.  I would guess that one should be
> able to reveal the thread bottleneck with a low freq dual core desktop
> system with an HBA such as the LSI 9211-8i @320K IOPS, and 8 Sandforce
> 2200 based SSDs @40K write IOPS each.
>
> --
> Stan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html