On 5/20/12 5:55 PM, Ramon Diaz-Uriarte wrote: > I might have to look at DRBD more carefully, but I do not think it > fits my needs: I need both nodes to be working (and thus doing I/O) at > the same time. These are basically number crunching nodes and data > needs to be accessible from both nodes (e.g., some jobs will use MPI > over the CPUs/cores of both nodes ---assuming both nodes are up, of > course ;-). DRBD will let you do read/write on both nodes, but it requires a clustered filesystem such as GFS2 or OCFS2 on top of it. You are also limited to a max of two nodes. > > But from the docs and the mailing list I get the impression that > replication has severe performance penalties when writing and some > penalties when reading. And with a two-node setup, it is unclear to me > that, even with replication, if one node fails, gluster will continue to > work (i.e., the other node will continue to work). I've not been able to > find what is the recommended procedure to continue working, with > replicated volumes, when one of the two nodes fails. So that is why I am > wondering what would replication really give me in this case. Gluster doing replication requires writes to hit both nodes, which may slow you down a lot if there is significant latency between the two. I run a replicated configuration, and have had nodes down for extended periods - Gluster will repair the missing data from the brick on the failed node during self-heal, so it is transparent. I've never had to shut down applications in order for gluster to fix something first. David