GlusterFS on a two-node setup

rdiaz02 at gmail.com (Ramon Diaz-Uriarte) · Mon, 21 May 2012 19:50:42 +0200

On Mon, 21 May 2012 13:47:52 +0000,John Jolet <jjolet at drillinginfo.com> wrote:
> i suspect that an rsync with the proper argument will be fine before starting glusterd on the recovered node?.

Thanks.

R.

> On May 21, 2012, at 7:58 AM, Ramon Diaz-Uriarte wrote:

> > 
> > 
> > 
> > On Mon, 21 May 2012 00:46:33 +0000,John Jolet <jjolet at drillinginfo.com> wrote:
> > 
> >> On May 20, 2012, at 4:55 PM, Ramon Diaz-Uriarte wrote:
> > 
> >>> 
> >>> 
> >>> 
> >>> On Sun, 20 May 2012 20:38:02 +0100,Brian Candler <B.Candler at pobox.com> wrote:
> >>>> On Sun, May 20, 2012 at 01:26:51AM +0200, Ramon Diaz-Uriarte wrote:
> >>>>> Questions:
> >>>>> ==========
> >>>>> 
> >>>>> 1. Is using GlusterFS an overkill? (I guess the alternative would be to use
> >>>>> NFS from one of the nodes to the other)
> >>> 
> >>>> In my opinion, the other main option you should be looking at is DRBD
> >>>> (www.drbd.org).  This works at the block level, unlike glusterfs which works
> >>>> at the file level.  Using this you can mirror your disk remotely.
> >>> 
> >>> 
> >>> Brian, thanks for your reply. 
> >>> 
> >>> 
> >>> I might have to look at DRBD more carefully, but I do not think it fits my
> >>> needs: I need both nodes to be working (and thus doing I/O) at the same
> >>> time. These are basically number crunching nodes and data needs to be
> >>> accessible from both nodes (e.g., some jobs will use MPI over the
> >>> CPUs/cores of both nodes ---assuming both nodes are up, of course ;-).
> >>> 
> >>> 
> >>> 
> >>> 
> >>>> If you are doing virtualisation then look at Ganeti: this is an environment
> >>>> which combines LVM plus DRBD and allows you to run VMs on either node and
> >>>> live-migrate them from one to the other.
> >>>> http://docs.ganeti.org/ganeti/current/html/
> >>> 
> >>> I am not doing virtualisation. I should have said that explicitly. 
> >>> 
> >>> 
> >>>> If a node fails, you just restart the VMs on the other node and away you go.
> >>> 
> >>>>> 2. I plan on using a dedicated partition from each node as a brick. Should
> >>>>> I use replicated or distributed volumes?
> >>> 
> >>>> A distributed volume will only increase the size of storage available (e.g. 
> >>>> combining two 600GB drives into one 1.2GB volume - as long as no single file
> >>>> is too large).  If this is all you need, you'd probably be better off buying
> >>>> bigger disks in the first place.
> >>> 
> >>>> A replicated volume allows you to have a copy of every file on both nodes
> >>>> simultaneously, kept in sync in real time, and gives you resilience against
> >>>> one of the nodes failing.
> >>> 
> >>> 
> >>> But from the docs and the mailing list I get the impression that
> >>> replication has severe performance penalties when writing and some
> >>> penalties when reading. And with a two-node setup, it is unclear to me
> >>> that, even with replication, if one node fails, gluster will continue to
> >>> work (i.e., the other node will continue to work). I've not been able to
> >>> find what is the recommended procedure to continue working, with
> >>> replicated volumes, when one of the two nodes fails. So that is why I am
> >>> wondering what would replication really give me in this case.
> >>> 
> >>> 
> >> replicated volumes have a performance penalty on the client.  for
> >> instance, i have a replicated volume, with one replica on each of two
> >> nodes.  I'm front ending this with an ubuntu box running samba for cifs
> >> sharing.  if my windows client sends 100MB to the cifs server, the cifs
> >> server will send 100MB to each node in the replica set.  As for what you
> >> have to do to continue working if a node went down, i have tested this.
> >> Not on purpose, but one of my nodes was accidentally downed.  my client
> >> saw no difference.  however, running 3.2.x, in order to get the client
> >> to use the downed node after it was brought back up, i had to remount
> >> the share on the cifs server.  this is supposedly fixed in 3.3.
> > 
> > OK, great. Thanks for the info. It is clear, then, that several of you
> > report that this will work just fine.
> > 
> > 
> >> It's important to note that self-healing will create files created while
> >> the node was offline, but does not DELETE files deleted while the node
> >> was offline.  not sure what the official line is there, but my use is
> >> archival, so it doesn't matter enough to me to run down (if they'd
> >> delete files, i wouldn't need gluster..)
> > 
> > 
> > That is good to know, but is not something I'd want. Is there any way to
> > get files to be deleted? Maybe rsync'ing or similar before self-healing
> > starts? Or will that lead to chaos?
> > 
> > 
> > 
> > Best,
> > 
> > R.
> > 
> > 
> >>> Best,
> >>> 
> >>> R.
> >>> 
> >>> 
> >>> 
> >>> 
> >>>> Regards,
> >>> 
> >>>> Brian.
> >>> -- 
> >>> Ramon Diaz-Uriarte
> >>> Department of Biochemistry, Lab B-25
> >>> Facultad de Medicina 
> >>> Universidad Aut?noma de Madrid 
> >>> Arzobispo Morcillo, 4
> >>> 28029 Madrid
> >>> Spain
> >>> 
> >>> Phone: +34-91-497-2412
> >>> 
> >>> Email: rdiaz02 at gmail.com
> >>>      ramon.diaz at iib.uam.es
> >>> 
> >>> http://ligarto.org/rdiaz
> >>> 
> >>> _______________________________________________
> >>> Gluster-users mailing list
> >>> Gluster-users at gluster.org
> >>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> > 
> > -- 
> > Ramon Diaz-Uriarte
> > Department of Biochemistry, Lab B-25
> > Facultad de Medicina 
> > Universidad Aut?noma de Madrid 
> > Arzobispo Morcillo, 4
> > 28029 Madrid
> > Spain
> > 
> > Phone: +34-91-497-2412
> > 
> > Email: rdiaz02 at gmail.com
> >       ramon.diaz at iib.uam.es
> > 
> > http://ligarto.org/rdiaz
> > 

-- 
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina 
Universidad Aut?noma de Madrid 
Arzobispo Morcillo, 4
28029 Madrid
Spain

Phone: +34-91-497-2412

Email: rdiaz02 at gmail.com
       ramon.diaz at iib.uam.es

http://ligarto.org/rdiaz