did you contacted texas ssd solutions? i don't know how much $$$ should you pay for this setup, but it's a nice solution... 2011/3/18 Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>: > Christoph Hellwig put forth on 3/18/2011 9:05 AM: > > Thanks for the confirmations and explanations. > >> The kernel is pretty smart in placement of user and page cache data, but >> it can't really second guess your intention. With the numactl tool you >> can help it doing the proper placement for you workload. Note that the >> choice isn't always trivial - a numa system tends to have memory on >> multiple nodes, so you'll either have to find a good partitioning of >> your workload or live with off-node references. I don't think >> partitioning NFS workloads is trivial, but then again I'm not a >> networking expert. > > Bringing mdraid back into the fold, I'm wondering what kinda of load the > mdraid threads would place on a system of the caliber needed to push > 10GB/s NFS. > > Neil, I spent quite a bit of time yesterday spec'ing out what I believe > is the bare minimum AMD64 based hardware needed to push 10GB/s NFS. > This includes: > > 4 LSI 9285-8e 8port SAS 800MHz dual core PCIE x8 HBAs > 3 NIAGARA 32714 PCIe x8 Quad Port Fiber 10 Gigabit Server Adapter > > This gives us 32 6Gb/s SAS ports and 12 10GbE ports total, for a raw > hardware bandwidth of 20GB/s SAS and 15GB/s ethernet. > > I made the assumption that RAID 10 would be the only suitable RAID level > due to a few reasons: > > 1. The workload being 50+ NFS large file reads of aggregate 10GB/s, > yielding a massive random IO workload at the disk head level. > > 2. We'll need 384 15k SAS drives to service a 10GB/s random IO load > > 3. We'll need multiple "small" arrays enabling multiple mdraid threads, > assuming a single 2.4GHz core isn't enough to handle something like 48 > or 96 mdraid disks. > > 4. Rebuild times for parity raid schemes would be unacceptably high and > would eat all of the CPU the rebuild thread would run on > > To get the bandwidth we need and making sure we don't run out of > controller chip IOPS, my calculations show we'd need 16 x 24 drive > mdraid 10 arrays. Thus, ignoring all other considerations momentarily, > a dual AMD 6136 platform with 16 2.4GHz cores seems suitable, with one > mdraid thread per core, each managing a 24 drive RAID 10. Would we then > want to layer a --linear array across the 16 RAID 10 arrays? If we did > this, would the linear thread bottleneck instantly as it runs on only > one core? How many additional memory copies (interconnect transfers) > are we going to be performing per mdraid thread for each block read > before the data is picked up by the nfsd kernel threads? > > How much of each core's cycles will we consume with normal random read > operations assuming 10GB/s of continuous aggregate throughput? Would > the mdraid threads consume sufficient cycles that when combined with > network stack processing and interrupt processing, that 16 cores at > 2.4GHz would be insufficient? If so, would bumping the two sockets up > to 24 cores at 2.1GHz be enough for the total workload? Or, would we > need to move to a 4 socket system with 32 or 48 cores? > > Is this possibly a situation where mdraid just isn't suitable due to the > CPU, memory, and interconnect bandwidth demands, making hardware RAID > the only real option? And if it does requires hardware RAID, would it > be possible to stick 16 block devices together in a --linear mdraid > array and maintain the 10GB/s performance? Or, would the single > --linear array be processed by a single thread? If so, would a single > 2.4GHz core be able to handle an mdraid --leaner thread managing 8 > devices at 10GB/s aggregate? > > Unfortunately I don't currently work in a position allowing me to test > such a system, and I certainly don't have the personal financial > resources to build it. My rough estimate on the hardware cost is > $150-200K USD. The 384 Hitachi 15k SAS 146GB drives at $250 each > wholesale are a little over $90k. > > It would be really neat to have a job that allowed me to setup and test > such things. :) > > -- > Stan > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html