Split brain errors

mohitanchlia at gmail.com (Mohit Anchlia) · Fri, 29 Apr 2011 11:07:01 -0700

All I do is
stop volume,
delete volume,
remove mount dir,
create mount dir
create volume and this happens.

I have also tried stopping glusterd after deleting volume and then
start before creating volume again. But this consistently happens.

Also, when I add multiple bricks on the same server some of the
servers don't have all the xattr.
On Fri, Apr 29, 2011 at 10:25 AM, Pranith Kumar. Karampuri
<pranithk at gluster.com> wrote:
> hi Mohit,
> ? ? ? ? Do you know what exact steps are leading to this problem?.
>
> Pranith.
>
> ----- Original Message -----
> From: "Mohit Anchlia" <mohitanchlia at gmail.com>
> To: "Amar Tumballi" <amar at gluster.com>, gluster-users at gluster.org
> Sent: Friday, April 29, 2011 9:49:33 PM
> Subject: Re: Split brain errors
>
> Can someone from dev please help reply? Should I open a bug?
>
> On Thu, Apr 28, 2011 at 2:17 PM, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
>> I got some help and fixed these by setting xattr. For eg changed "I"
>> to "A" using setfattr.
>>
>> But now my next question is why did this happen at first place and
>> what measure needs to be taken so that this doesn't happen? It keeps
>> happening even if I start clean, stop vol, delete vol, delete
>> contents, re-create vols. Also, some of the bricks don't have
>> "stress-volume" attr.
>>
>>
>>
>> getfattr -dm - /data1/gluster
>> getfattr: Removing leading '/' from absolute path names
>> # file: data1/gluster
>> trusted.afr.stress-volume-client-8=0sAAAAAAIAAAAAAAAA
>> trusted.afr.stress-volume-client-9=0sAAAAAAAAAAAAAAAA
>> trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ==
>> trusted.glusterfs.dht=0sAAAAAQAAAAAqqqqqP////g==
>> trusted.glusterfs.test="working\000"
>>
>>
>> On Thu, Apr 28, 2011 at 9:24 AM, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
>>> I create 30K directories in the client mountpoint. But I've done this
>>> test with mkfs -I 256 and with default 128 byte (Red hat 5.6). Only
>>> when I create mkfs -I 256 I see these errors. Looks like the reason
>>> for the failure because otherwise everything else is same. Same no of
>>> bricks, servers, user (root) etc.
>>>
>>> I run the stress test and client mount logs are full with these errors
>>> for every subvolume. Looks like it's happening for every file that's
>>> being writen
>>>
>>> On Thu, Apr 28, 2011 at 9:20 AM, Amar Tumballi <amar at gluster.com> wrote:
>>>> I am seeing the directory size to be different here. Let me confirm if we
>>>> are checking extra for size to be same also (for directories it will not be
>>>> needed). In that case, this log makes sense, but surely that is a false
>>>> positive.
>>>> -Amar
>>>>
>>>> On Thu, Apr 28, 2011 at 9:44 PM, Mohit Anchlia <mohitanchlia at gmail.com>
>>>> wrote:
>>>>>
>>>>> Yes they are the same. It looks like this problem appears only when I
>>>>> use -I 256 when creating mkfs. Why would that be?
>>>>>
>>>>> [root at dsdb1 ~]# ls -ltr /data/
>>>>> total 5128
>>>>> drwx------ ? ? 2 root root ? 16384 Apr 27 16:57 lost+found
>>>>> drwxr-xr-x 30003 root root 4562944 Apr 27 17:15 mnt-stress
>>>>> drwxr-xr-x 30003 root root ?598016 Apr 27 17:15 gluster
>>>>> [root at dsdb1 ~]# ls -ltr /data1/
>>>>> total 572
>>>>> drwx------ ? ? 2 root root ?16384 Apr 27 16:59 lost+found
>>>>> drwxr-xr-x 30003 root root 561152 Apr 27 17:15 gluster
>>>>>
>>>>> [root at dsdb2 ~]# ls -ltr /data
>>>>> total 588
>>>>> drwx------ ? ? 2 root root ?16384 Apr 27 16:52 lost+found
>>>>> drwxr-xr-x ? ? 2 root root ? 4096 Apr 27 17:09 mnt-stress
>>>>> drwxr-xr-x 30003 root root 573440 Apr 27 17:15 gluster
>>>>> [root at dsdb2 ~]# ls -ltr /data1
>>>>> total 592
>>>>> drwx------ ? ? 2 root root ?16384 Apr 27 16:54 lost+found
>>>>> drwxr-xr-x 30003 root root 581632 Apr 27 17:15 gluster
>>>>>
>>>>>
>>>>> On Wed, Apr 27, 2011 at 11:18 PM, Amar Tumballi <amar at gluster.com> wrote:
>>>>> >>
>>>>> >> [2011-04-27 17:11:29.13142] E
>>>>> >> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>>>> >> 0-stress-volume-replicate-0: Unable to self-heal permissions/ownership
>>>>> >> of '/' (possible split-brain). Please fix the file on all backend
>>>>> >> volumes
>>>>> >>
>>>>> >> Can someone please help me reason for this problem?
>>>>> >>
>>>>> >> ?gluster volume info all
>>>>> >>
>>>>> >> Volume Name: stress-volume
>>>>> >> Type: Distributed-Replicate
>>>>> >> Status: Started
>>>>> >> Number of Bricks: 8 x 2 = 16
>>>>> >> Transport-type: tcp
>>>>> >> Bricks:
>>>>> >> Brick1: dsdb1:/data/gluster
>>>>> >> Brick2: dsdb2:/data/gluster
>>>>> >
>>>>> > Did you check the permission/ownership of these exports? Please make
>>>>> > sure
>>>>> > that they are same.
>>>>> > Regards,
>>>>> > Amar
>>>>> >
>>>>> >>
>>>>> >> Brick3: dsdb3:/data/gluster
>>>>> >> Brick4: dsdb4:/data/gluster
>>>>> >> Brick5: dsdb5:/data/gluster
>>>>> >> Brick6: dsdb6:/data/gluster
>>>>> >> Brick7: dslg1:/data/gluster
>>>>> >> Brick8: dslg2:/data/gluster
>>>>> >> Brick9: dsdb1:/data1/gluster
>>>>> >> Brick10: dsdb2:/data1/gluster
>>>>> >> Brick11: dsdb3:/data1/gluster
>>>>> >> Brick12: dsdb4:/data1/gluster
>>>>> >> Brick13: dsdb5:/data1/gluster
>>>>> >> Brick14: dsdb6:/data1/gluster
>>>>> >> Brick15: dslg1:/data1/gluster
>>>>> >> Brick16: dslg2:/data1/gluster
>>>>> >
>>>>> >
>>>>> >
>>>>
>>>>
>>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>