Split brain; which file to choose for repair?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks!

 

Unfortunately the md5sums don't match...  but the sizes & timestamps do. 

These binary files from VMs are quite difficult to work with. That's why I
was after some ideas WHICH file should be preferred.

 

I'm actually quite concerned how on earth do we trigger "split brain"
conditions on a regular basis. 

 

Are there any good "DON'Ts" for server updates/maintenance? The current way
is shutting down one server , upgrading/installing it, bringing up the
Gluster daemons (server & client)/mounting the file system, checking
everything is running. Then starting on the next server. 

So a pure sequential approach. Thought this would minimise the risk of
getting the files in a twist?!?

 

Best, Martin

 

From: Anand Avati [mailto:anand.avati at gmail.com] 
Sent: Wednesday, May 04, 2011 2:55 PM
To: Martin Schenker
Cc: gluster-users at gluster.org
Subject: Re: Split brain; which file to choose for repair?

 

Occurance of a split brain situation is only under a specific sequence of
events and modifications and the filesystem cannot decide which of the two
copies of the file is updated. It might so happen that the two changes were
actually the same "change" and hence the two copies of your file might match
md5sum (in which case you can delete one arbitrarily). If not, you need to
know how your application works and which of the file (inspecting the
content) is more appropriate to be deleted.

 

Avati

On Wed, May 4, 2011 at 5:54 PM, Martin Schenker
<martin.schenker at profitbricks.com> wrote:

Hi all!

Is there anybody who can give some pointers regarding which file to choose
in a "split brain" condition?

What tests do I need to run?

What does the hex AFR code actually show? Is there a way to pinpoint the
"better/worse" file for deletion?

On pserver12:


# file: mnt/gluster/brick0/storage/pserver3-19
trusted.afr.storage0-client-5=0x3f0000010000000000000000

On pserver13:


# file: mnt/gluster/brick0/storage/pserver3-19
trusted.afr.storage0-client-4=0xd70000010000000000000000

These are test files, but I'd like to know what to do in a LIFE situation
which will be just around the corner.

The Timestamps show the same values, so I'm a bit puzzled HOW to choose a
file.

pserver12:

0 root at de-dc1-c1-pserver12:~ # ls -al
/mnt/gluster/brick0/storage/pserver3-19
-rw-r--r-- 1 vcb root 3456106496 Apr 29 17:40
/mnt/gluster/brick0/storage/pserver3-19

0 root at de-dc1-c1-pserver12:~ # ls -alu
/mnt/gluster/brick0/storage/pserver3-19
-rw-r--r-- 1 vcb root 3456106496 Apr 28 16:18
/mnt/gluster/brick0/storage/pserver3-19

pserver13:

0 root at de-dc1-c1-pserver13:~ # ls -al
/mnt/gluster/brick0/storage/pserver3-19
-rw-r--r-- 1 vcb root 3456106496 Apr 29 17:40
/mnt/gluster/brick0/storage/pserver3-19

0 root at de-dc1-c1-pserver13:~ # ls -alu
/mnt/gluster/brick0/storage/pserver3-19
-rw-r--r-- 1 vcb root 3456106496 Apr 28 16:18
/mnt/gluster/brick0/storage/pserver3-19

Best, Martin


_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gluster.org/pipermail/gluster-users/attachments/20110504/c8e09597/attachment-0001.htm>


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux