Re: solutions for split brain situation

Michael Cassaniti <m.cassaniti@xxxxxxxxx> · Tue, 15 Sep 2009 10:25:59 +1000

2009/9/15 Stephan von Krawczynski <skraw@xxxxxxxxxx>

On Mon, 14 Sep 2009 21:20:49 +0200

"Steve" <steeeeeveee@xxxxxxx> wrote:

>

> -------- Original-Nachricht --------

> > Datum: Mon, 14 Sep 2009 21:14:32 +0200

> > Von: Stephan von Krawczynski <skraw@xxxxxxxxxx>

> > An: Anand Avati <avati@xxxxxxxxxxx>

> > CC: gluster-devel@xxxxxxxxxx

> > Betreff: Re: solutions for split brain situation

>

> > On Mon, 14 Sep 2009 21:44:12 +0530

> > Anand Avati <avati@xxxxxxxxxxx> wrote:

> >

> > > > Our "split brain" is no real split brain and looks like this: Logfiles

> > are

> > > > written every 5 mins. If you add a secondary server that has 14 days

> > old

> > > > logfiles on it you notice that about half of your data vanishes while

> > not

> > > > successful self heal is performed, because the old logfiles read from

> > the

> > > > secondary server overwrite the new logfiles on your primary while new

> > data is

> > > > added to them.

> > >

> > > Have you been using favorite-child option?

> >

> > No, the option was not used.

> >

> > > Auto resolving of

> > > split-brain is bound to make you lose data of one of the subvolumes.

> > > If you had indeed specified favorite-child option, and the

> > > favorite-child option happens to be the server which had 14day old

> > > logs, what just happened was exactly what was in the elaborate warning

> > > log.

> > >

> > > Now what is more interesting for me is, the sequence of taking down

> > > and bringing up the servers you followed to split brain? Was is really

> > > just taking one server (any of them) down and bringing it back up? Did

> > > you face a split brain with just this? Can you please describe the

> > > minimal steps necessary to reproduce your issue?

> >

> > Take 2 servers and one client. Use a minimal replicate setup but do _not_

> > add

> > the second server. Copy some data on the first server via glusterfs, then

> > rsync that data on the second server directly from the first server

> > (glusterfsd not yet active there). Now change some of the data to have

> > files

> > that are really newer as your rsync cycle. Then start glusterfsd on the

> > second

> > server. Your client will add it. Then open the newer files r/w on the

> > client.

> > You will notice the split brain messages in the client logs and find that

> > every

> > other file gets indeed read in from the second (outdated) server fileset.

> > Write it back and your newer files on the first server are gone.

> > As said, no favorite child option set.

> >

> You just rsynced but did you synced the extended attributes as well?

No, we explicitly did not sync the extended attributes. But your question

should be placed more general: if I have a working glusterfs server, must all

data be backuped including extended attributes?

Why should it be lethal not to backup them, when I can get data online by

simply starting to export it via glusterfsd that has not been touched by

glusterfsd before? (think of a first-time export, you have some data and

install glusterfs for the very first time. Your data is of course exported

without any troubles. Where is the difference to a rsync backup with no

extended attributes?

Can we all read this in relation to extended attributes and the cluster/replicate translator. Understanding AFR translator

Also, if you want more reliable restores directly to a single storage brick (rather than restoring onto a replicate translator) I would suggest you have a backup system that handles extended attributes. I am using bacula for this purpose, but you may find other solutions that fit.

Regards,
Michael Cassaniti