Split brain errors

mohitanchlia at gmail.com (Mohit Anchlia) · Thu, 28 Apr 2011 14:17:05 -0700

I got some help and fixed these by setting xattr. For eg changed "I"
to "A" using setfattr.

But now my next question is why did this happen at first place and
what measure needs to be taken so that this doesn't happen? It keeps
happening even if I start clean, stop vol, delete vol, delete
contents, re-create vols. Also, some of the bricks don't have
"stress-volume" attr.

getfattr -dm - /data1/gluster
getfattr: Removing leading '/' from absolute path names
# file: data1/gluster
trusted.afr.stress-volume-client-8=0sAAAAAAIAAAAAAAAA
trusted.afr.stress-volume-client-9=0sAAAAAAAAAAAAAAAA
trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.glusterfs.dht=0sAAAAAQAAAAAqqqqqP////g==
trusted.glusterfs.test="working\000"

On Thu, Apr 28, 2011 at 9:24 AM, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
> I create 30K directories in the client mountpoint. But I've done this
> test with mkfs -I 256 and with default 128 byte (Red hat 5.6). Only
> when I create mkfs -I 256 I see these errors. Looks like the reason
> for the failure because otherwise everything else is same. Same no of
> bricks, servers, user (root) etc.
>
> I run the stress test and client mount logs are full with these errors
> for every subvolume. Looks like it's happening for every file that's
> being writen
>
> On Thu, Apr 28, 2011 at 9:20 AM, Amar Tumballi <amar at gluster.com> wrote:
>> I am seeing the directory size to be different here. Let me confirm if we
>> are checking extra for size to be same also (for directories it will not be
>> needed). In that case, this log makes sense, but surely that is a false
>> positive.
>> -Amar
>>
>> On Thu, Apr 28, 2011 at 9:44 PM, Mohit Anchlia <mohitanchlia at gmail.com>
>> wrote:
>>>
>>> Yes they are the same. It looks like this problem appears only when I
>>> use -I 256 when creating mkfs. Why would that be?
>>>
>>> [root at dsdb1 ~]# ls -ltr /data/
>>> total 5128
>>> drwx------ ? ? 2 root root ? 16384 Apr 27 16:57 lost+found
>>> drwxr-xr-x 30003 root root 4562944 Apr 27 17:15 mnt-stress
>>> drwxr-xr-x 30003 root root ?598016 Apr 27 17:15 gluster
>>> [root at dsdb1 ~]# ls -ltr /data1/
>>> total 572
>>> drwx------ ? ? 2 root root ?16384 Apr 27 16:59 lost+found
>>> drwxr-xr-x 30003 root root 561152 Apr 27 17:15 gluster
>>>
>>> [root at dsdb2 ~]# ls -ltr /data
>>> total 588
>>> drwx------ ? ? 2 root root ?16384 Apr 27 16:52 lost+found
>>> drwxr-xr-x ? ? 2 root root ? 4096 Apr 27 17:09 mnt-stress
>>> drwxr-xr-x 30003 root root 573440 Apr 27 17:15 gluster
>>> [root at dsdb2 ~]# ls -ltr /data1
>>> total 592
>>> drwx------ ? ? 2 root root ?16384 Apr 27 16:54 lost+found
>>> drwxr-xr-x 30003 root root 581632 Apr 27 17:15 gluster
>>>
>>>
>>> On Wed, Apr 27, 2011 at 11:18 PM, Amar Tumballi <amar at gluster.com> wrote:
>>> >>
>>> >> [2011-04-27 17:11:29.13142] E
>>> >> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> >> 0-stress-volume-replicate-0: Unable to self-heal permissions/ownership
>>> >> of '/' (possible split-brain). Please fix the file on all backend
>>> >> volumes
>>> >>
>>> >> Can someone please help me reason for this problem?
>>> >>
>>> >> ?gluster volume info all
>>> >>
>>> >> Volume Name: stress-volume
>>> >> Type: Distributed-Replicate
>>> >> Status: Started
>>> >> Number of Bricks: 8 x 2 = 16
>>> >> Transport-type: tcp
>>> >> Bricks:
>>> >> Brick1: dsdb1:/data/gluster
>>> >> Brick2: dsdb2:/data/gluster
>>> >
>>> > Did you check the permission/ownership of these exports? Please make
>>> > sure
>>> > that they are same.
>>> > Regards,
>>> > Amar
>>> >
>>> >>
>>> >> Brick3: dsdb3:/data/gluster
>>> >> Brick4: dsdb4:/data/gluster
>>> >> Brick5: dsdb5:/data/gluster
>>> >> Brick6: dsdb6:/data/gluster
>>> >> Brick7: dslg1:/data/gluster
>>> >> Brick8: dslg2:/data/gluster
>>> >> Brick9: dsdb1:/data1/gluster
>>> >> Brick10: dsdb2:/data1/gluster
>>> >> Brick11: dsdb3:/data1/gluster
>>> >> Brick12: dsdb4:/data1/gluster
>>> >> Brick13: dsdb5:/data1/gluster
>>> >> Brick14: dsdb6:/data1/gluster
>>> >> Brick15: dslg1:/data1/gluster
>>> >> Brick16: dslg2:/data1/gluster
>>> >
>>> >
>>> >
>>
>>
>