Re: high throughput storage server?

Roberto Spadim <roberto@xxxxxxxxxxxxx> · Fri, 18 Mar 2011 19:23:19 -0300

i think linux can do this job without problems, md code is very
mature. the problem here is: what size/speed of cpu/ram/network/disk
should we use?
slow disk use raid0
mirror use raid1

raid 4,5,6 are cpu intensive, maybe a problem on very high speed (if
you have money buy more cpu and no problems)

2011/3/18 NeilBrown <neilb@xxxxxxx>:
> On Fri, 18 Mar 2011 10:43:43 -0500 Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>
> wrote:
>
>> Christoph Hellwig put forth on 3/18/2011 9:05 AM:
>>
>> Thanks for the confirmations and explanations.
>>
>> > The kernel is pretty smart in placement of user and page cache data, but
>> > it can't really second guess your intention.  With the numactl tool you
>> > can help it doing the proper placement for you workload.  Note that the
>> > choice isn't always trivial - a numa system tends to have memory on
>> > multiple nodes, so you'll either have to find a good partitioning of
>> > your workload or live with off-node references.  I don't think
>> > partitioning NFS workloads is trivial, but then again I'm not a
>> > networking expert.
>>
>> Bringing mdraid back into the fold, I'm wondering what kinda of load the
>> mdraid threads would place on a system of the caliber needed to push
>> 10GB/s NFS.
>>
>> Neil, I spent quite a bit of time yesterday spec'ing out what I believe
>
> Addressing me directly in an email that wasn't addressed to me directly seem
> a bit ... odd.  Maybe that is just me.
>
>> is the bare minimum AMD64 based hardware needed to push 10GB/s NFS.
>> This includes:
>>
>>   4 LSI 9285-8e 8port SAS 800MHz dual core PCIE x8 HBAs
>>   3 NIAGARA 32714 PCIe x8 Quad Port Fiber 10 Gigabit Server Adapter
>>
>> This gives us 32 6Gb/s SAS ports and 12 10GbE ports total, for a raw
>> hardware bandwidth of 20GB/s SAS and 15GB/s ethernet.
>>
>> I made the assumption that RAID 10 would be the only suitable RAID level
>> due to a few reasons:
>>
>> 1.  The workload being 50+ NFS large file reads of aggregate 10GB/s,
>> yielding a massive random IO workload at the disk head level.
>>
>> 2.  We'll need 384 15k SAS drives to service a 10GB/s random IO load
>>
>> 3.  We'll need multiple "small" arrays enabling multiple mdraid threads,
>> assuming a single 2.4GHz core isn't enough to handle something like 48
>> or 96 mdraid disks.
>>
>> 4.  Rebuild times for parity raid schemes would be unacceptably high and
>> would eat all of the CPU the rebuild thread would run on
>>
>> To get the bandwidth we need and making sure we don't run out of
>> controller chip IOPS, my calculations show we'd need 16 x 24 drive
>> mdraid 10 arrays.  Thus, ignoring all other considerations momentarily,
>> a dual AMD 6136 platform with 16 2.4GHz cores seems suitable, with one
>> mdraid thread per core, each managing a 24 drive RAID 10.  Would we then
>> want to layer a --linear array across the 16 RAID 10 arrays?  If we did
>> this, would the linear thread bottleneck instantly as it runs on only
>> one core?  How many additional memory copies (interconnect transfers)
>> are we going to be performing per mdraid thread for each block read
>> before the data is picked up by the nfsd kernel threads?
>>
>> How much of each core's cycles will we consume with normal random read
>
> For RAID10, the md thread plays no part in reads.  Which ever thread
> submitted the read submits it all the way down to the relevant member device.
> If the read fails the thread will come in to play.
>
> For writes, the thread is used primarily to make sure the writes are properly
> orders w.r.t. bitmap updates.  I could probably remove that requirement if a
> bitmap was not in use...
>
>> operations assuming 10GB/s of continuous aggregate throughput?  Would
>> the mdraid threads consume sufficient cycles that when combined with
>> network stack processing and interrupt processing, that 16 cores at
>> 2.4GHz would be insufficient?  If so, would bumping the two sockets up
>> to 24 cores at 2.1GHz be enough for the total workload?  Or, would we
>> need to move to a 4 socket system with 32 or 48 cores?
>>
>> Is this possibly a situation where mdraid just isn't suitable due to the
>> CPU, memory, and interconnect bandwidth demands, making hardware RAID
>> the only real option?
>
> I'm sorry, but I don't do resource usage estimates or comparisons with
> hardware raid.  I just do software design and coding.
>
>
>>     And if it does requires hardware RAID, would it
>> be possible to stick 16 block devices together in a --linear mdraid
>> array and maintain the 10GB/s performance?  Or, would the single
>> --linear array be processed by a single thread?  If so, would a single
>> 2.4GHz core be able to handle an mdraid --leaner thread managing 8
>> devices at 10GB/s aggregate?
>
> There is no thread for linear or RAID0.
>
> If you want to share load over a number of devices, you would normally use
> RAID0.  However if the load had a high thread count and the filesystem
> distributed IO evenly across the whole device space, then linear might work
> for you.
>
> NeilBrown
>
>
>>
>> Unfortunately I don't currently work in a position allowing me to test
>> such a system, and I certainly don't have the personal financial
>> resources to build it.  My rough estimate on the hardware cost is
>> $150-200K USD.  The 384 Hitachi 15k SAS 146GB drives at $250 each
>> wholesale are a little over $90k.
>>
>> It would be really neat to have a job that allowed me to setup and test
>> such things. :)
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html