Re: high throughput storage server?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15/02/2011 13:29, Stan Hoeppner wrote:
Matt Garman put forth on 2/14/2011 5:59 PM:

The requirement is basically this: around 40 to 50 compute machines
act as basically an ad-hoc scientific compute/simulation/analysis
cluster.  These machines all need access to a shared 20 TB pool of
storage.  Each compute machine has a gigabit network connection, and
it's possible that nearly every machine could simultaneously try to
access a large (100 to 1000 MB) file in the storage pool.  In other
words, a 20 TB file store with bandwidth upwards of 50 Gbps.

If your description of the requirement is accurate, then what you need is a
_reliable_ high performance NFS server backed by many large/fast spindles.

I was wondering if anyone on the list has built something similar to
this using off-the-shelf hardware (and Linux of course)?

My thoughtful, considered, recommendation would be to stay away from a DIY build
for the requirement you describe, and stay away from mdraid as well, but not
because mdraid isn't up to the task.  I get the feeling you don't fully grasp
some of the consequences of a less than expert level mdraid admin being
responsible for such a system after it's in production.  If multiple drives are
kicked off line simultaneously (posts of such seem to occur multiple times/week
here), downing the array, are you capable of bringing it back online intact,
successfully, without outside assistance, in a short period of time?  If you
lose the entire array due to a typo'd mdadm parm, then what?


This brings up an important point - no matter what sort of system you get (home made, mdadm raid, or whatever) you will want to do some tests and drills at replacing failed drives. Also make sure everything is well documented, and well labelled. When mdadm sends you an email telling you drive sdx has failed, you want to be /very/ sure you know which drive is sdx before you take it out!



You also want to consider your raid setup carefully. RAID 10 has been mentioned here several times - it is often a good choice, but not necessarily. RAID 10 gives you fast recovery, and can at best survive a loss of half your disks - but at worst a loss of two disks will bring down the whole set. It is also very inefficient in space. If you use SSDs, it may not be worth double the price to have RAID 10. If you use hard disks, it may not be sufficient safety.

I haven't build a raid of anything like this size, so my comments here are only based on my imperfect understanding of the theory - I'm learning too.

RAID 10 has the advantage of good speed at reading (close to RAID 0 speeds), at the cost of poorer write speed and poor space efficiency. RAID 5 and RAID 6 are space efficient, and fast for most purposes, but slow for rebuilds and slow for small writes.

You are not much bothered about write performance, and most of your writes are large anyway.

How about building the array as a two-tier RAID 6+5 setup? Take 7 x 1TB disks as a RAID 6 for 5 TB space. Five sets of these as RAID 5 gives you your 20 TB in 35 drives. This will survive any four failed disks, or more depending on the combinations. If you are careful how it is arranged, it will also survive a failing controller card.

If a disk fails, you could remove that whole set from the outer array (which should have a write intent bitmap) - then the rebuild will go at maximal speed, while the outer array's speed will not be so badly affected. Once the rebuild is complete, put it back in the outer array. Since you are not doing many writes, it will not take long to catch up.

It is probably worth having a small array of SSDs (RAID1 or RAID10) to hold the write intent bitmap, the journal for your main file system, and of course your OS. Maybe one of these absurdly fast PCI Express flash disks would be a good choice.



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux