i think linux can do this job without problems, md code is very mature. the problem here is: what size/speed of cpu/ram/network/disk should we use? slow disk use raid0 mirror use raid1 raid 4,5,6 are cpu intensive, maybe a problem on very high speed (if you have money buy more cpu and no problems) 2011/3/18 NeilBrown <neilb@xxxxxxx>: > On Fri, 18 Mar 2011 10:43:43 -0500 Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> > wrote: > >> Christoph Hellwig put forth on 3/18/2011 9:05 AM: >> >> Thanks for the confirmations and explanations. >> >> > The kernel is pretty smart in placement of user and page cache data, but >> > it can't really second guess your intention. With the numactl tool you >> > can help it doing the proper placement for you workload. Note that the >> > choice isn't always trivial - a numa system tends to have memory on >> > multiple nodes, so you'll either have to find a good partitioning of >> > your workload or live with off-node references. I don't think >> > partitioning NFS workloads is trivial, but then again I'm not a >> > networking expert. >> >> Bringing mdraid back into the fold, I'm wondering what kinda of load the >> mdraid threads would place on a system of the caliber needed to push >> 10GB/s NFS. >> >> Neil, I spent quite a bit of time yesterday spec'ing out what I believe > > Addressing me directly in an email that wasn't addressed to me directly seem > a bit ... odd. Maybe that is just me. > >> is the bare minimum AMD64 based hardware needed to push 10GB/s NFS. >> This includes: >> >> 4 LSI 9285-8e 8port SAS 800MHz dual core PCIE x8 HBAs >> 3 NIAGARA 32714 PCIe x8 Quad Port Fiber 10 Gigabit Server Adapter >> >> This gives us 32 6Gb/s SAS ports and 12 10GbE ports total, for a raw >> hardware bandwidth of 20GB/s SAS and 15GB/s ethernet. >> >> I made the assumption that RAID 10 would be the only suitable RAID level >> due to a few reasons: >> >> 1. The workload being 50+ NFS large file reads of aggregate 10GB/s, >> yielding a massive random IO workload at the disk head level. >> >> 2. We'll need 384 15k SAS drives to service a 10GB/s random IO load >> >> 3. We'll need multiple "small" arrays enabling multiple mdraid threads, >> assuming a single 2.4GHz core isn't enough to handle something like 48 >> or 96 mdraid disks. >> >> 4. Rebuild times for parity raid schemes would be unacceptably high and >> would eat all of the CPU the rebuild thread would run on >> >> To get the bandwidth we need and making sure we don't run out of >> controller chip IOPS, my calculations show we'd need 16 x 24 drive >> mdraid 10 arrays. Thus, ignoring all other considerations momentarily, >> a dual AMD 6136 platform with 16 2.4GHz cores seems suitable, with one >> mdraid thread per core, each managing a 24 drive RAID 10. Would we then >> want to layer a --linear array across the 16 RAID 10 arrays? If we did >> this, would the linear thread bottleneck instantly as it runs on only >> one core? How many additional memory copies (interconnect transfers) >> are we going to be performing per mdraid thread for each block read >> before the data is picked up by the nfsd kernel threads? >> >> How much of each core's cycles will we consume with normal random read > > For RAID10, the md thread plays no part in reads. Which ever thread > submitted the read submits it all the way down to the relevant member device. > If the read fails the thread will come in to play. > > For writes, the thread is used primarily to make sure the writes are properly > orders w.r.t. bitmap updates. I could probably remove that requirement if a > bitmap was not in use... > >> operations assuming 10GB/s of continuous aggregate throughput? Would >> the mdraid threads consume sufficient cycles that when combined with >> network stack processing and interrupt processing, that 16 cores at >> 2.4GHz would be insufficient? If so, would bumping the two sockets up >> to 24 cores at 2.1GHz be enough for the total workload? Or, would we >> need to move to a 4 socket system with 32 or 48 cores? >> >> Is this possibly a situation where mdraid just isn't suitable due to the >> CPU, memory, and interconnect bandwidth demands, making hardware RAID >> the only real option? > > I'm sorry, but I don't do resource usage estimates or comparisons with > hardware raid. I just do software design and coding. > > >> And if it does requires hardware RAID, would it >> be possible to stick 16 block devices together in a --linear mdraid >> array and maintain the 10GB/s performance? Or, would the single >> --linear array be processed by a single thread? If so, would a single >> 2.4GHz core be able to handle an mdraid --leaner thread managing 8 >> devices at 10GB/s aggregate? > > There is no thread for linear or RAID0. > > If you want to share load over a number of devices, you would normally use > RAID0. However if the load had a high thread count and the filesystem > distributed IO evenly across the whole device space, then linear might work > for you. > > NeilBrown > > >> >> Unfortunately I don't currently work in a position allowing me to test >> such a system, and I certainly don't have the personal financial >> resources to build it. My rough estimate on the hardware cost is >> $150-200K USD. The 384 Hitachi 15k SAS 146GB drives at $250 each >> wholesale are a little over $90k. >> >> It would be really neat to have a job that allowed me to setup and test >> such things. :) >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html