On 14/02/2011 23:59, Matt Garman wrote:
[...]
The requirement is basically this: around 40 to 50 compute machines
act as basically an ad-hoc scientific compute/simulation/analysis
cluster. These machines all need access to a shared 20 TB pool of
storage. Each compute machine has a gigabit network connection, and
it's possible that nearly every machine could simultaneously try to
access a large (100 to 1000 MB) file in the storage pool. In other
words, a 20 TB file store with bandwidth upwards of 50 Gbps.
I'd recommend you analyse that requirement more closely. Yes, you have
50 compute machines with GigE connections so it's possible they could
all demand data from the file store at once, but in actual use, would they?
For example, if these machines were each to demand a 100MB file, how
long would they spend computing their results from it? If it's only 1
second, then you would indeed need an aggregate bandwidth of 50Gbps[1].
If it's 20 seconds processing, your filer only needs an aggregate
bandwidth of 2.5Gbps.
So I'd recommend you work out first how much data the compute machines
can actually chew through and work up from there, rather than what their
network connections could stream through and work down.
Cheers,
John.
[1] I'm assuming the compute nodes are fetching the data for the next
compute cycle while they're working on this one; if they're not you're
likely making unnecessary demands on your filer while leaving your
compute nodes idle.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html