Possible split-brain

aroberts at domicilium.com (Aaron Roberts) · Thu, 11 Nov 2010 16:00:12 +0000

Hi all,
	I have 4 glusterd servers running a single glusterfs volume.  The volume was created using the gluster command line, with no changes from default.  The same machines all mount the volume using the native glusterfs client:

[root at localhost ~]# gluster volume create datastore replica 2 transport tcp 192.168.253.1:/glusterfs/primary 192.168.253.3:/glusterfs/secondary 192.168.253.2:/glusterfs/primary 192.168.253.4:/glusterfs/secondary 192.168.253.3:/glusterfs/primary 192.168.253.1:/glusterfs/secondary 192.168.253.4:/glusterfs/primary 192.168.253.2:/glusterfs/secondary

[root at localhost ~]# cat /etc/fstab

...
/dev/cciss/c0d0p6       /glusterfs/primary      ext4    defaults,noatime 1 2
/dev/cciss/c0d1p6       /glusterfs/secondary    ext4    defaults,noatime 1 2
192.168.253.1:/datastore /mnt/datastore	    glusterfs defaults,_netdev 0 0

[root at localhost ~]# gluster volume info

Volume Name: datastore
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: 192.168.253.1:/glusterfs/primary
Brick2: 192.168.253.3:/glusterfs/secondary
Brick3: 192.168.253.2:/glusterfs/primary
Brick4: 192.168.253.4:/glusterfs/secondary
Brick5: 192.168.253.3:/glusterfs/primary
Brick6: 192.168.253.1:/glusterfs/secondary
Brick7: 192.168.253.4:/glusterfs/primary
Brick8: 192.168.253.2:/glusterfs/secondary 

The platform is not currently running production data and I have been testing the redundancy of the setup (pulling cables etc.).  All my servers are now logging the following messages every 1 minute or so:

[2010-11-11 14:18:49.636327] I [afr-common.c:672:afr_lookup_done] datastore-replicate-0: split brain detected during lookup of /.
[2010-11-11 14:18:49.636388] I [afr-common.c:716:afr_lookup_done] datastore-replicate-0: background  meta-data data self-heal triggered. path: /
[2010-11-11 14:18:49.636863] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] datastore-replicate-0: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2010-11-11 14:18:49.637080] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] datastore-replicate-0: background  meta-data data self-heal completed on /
[2010-11-11 14:18:49.637561] I [afr-common.c:672:afr_lookup_done] datastore-replicate-0: split brain detected during lookup of /.
[2010-11-11 14:18:49.637588] I [afr-common.c:716:afr_lookup_done] datastore-replicate-0: background  meta-data data self-heal triggered. path: /
[2010-11-11 14:18:49.638064] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] datastore-replicate-0: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2010-11-11 14:18:49.638265] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] datastore-replicate-0: background  meta-data data self-heal completed on /

Can anyone tell me what I need to do to fix this?

Thanks,
	Aaron