GlusterFS suitability in an ad-hoc cluster

David Flynn <davidf@xxxxxxxxxxxx> · Fri, 5 Oct 2007 08:01:52 +0000

Hi,

I have an ad-hoc cluster of seven machines that are used for batch
processed computations.  Each machine has either a 1TB or 2TB local
array.  I'm currently investigating methods to gain cluster-wide
visability of all the local arrays on all nodes.  However, there are a
few complications:

 - Each node should have read-only access to the rest of the cluster.
 - Any writes should only be done to the local array.
 - I need to support disconnected operation of any node.
 - Hot add/remove of a node.

I think this also requires that there are no metadata servers.

Some more background, we are performing batch image processing
operations
on large [constant] data sets.  Currently we divide the data up and
replicate it across the machines; then follows the nightmare of trying
to
assign work to the correct node with the correct portion of the source
data.  Having all nodes see the whole [distributed] data set would be of
great benefit.

When a node processes the data, it needs to be stored on its local
array, since the machine may then be disconnected from the network to
playback the video (at rates of upto 400MB/sec). This also requires that
the disconnected node can read the filesystem without assistance from
the rest of the cluster.

Interconnect between the nodes is 1000baseT ethernet.

A final spanner in the works, it `would be nice' if identical (by name)
files appearing on seperate nodes could be load balanced in access in
some way.

Is any of this achievable with GlusterFS?  Is any more achievable with
modification?

..david