Split brain errors

mohitanchlia at gmail.com (Mohit Anchlia) · Fri, 29 Apr 2011 11:37:48 -0700

Yes I did check and that's how I fixed it by making it all hex "A"
using setfattr. That's one problem.

Also, I mentioned before there are some bricks missing xattr info. for eg:

[root at dsdb1 gluster]# getfattr -dm - /data2/gluster
getfattr: Removing leading '/' from absolute path names
# file: data2/gluster
trusted.afr.stress-volume-client-16=0sAAAAAAAAAAAAAAAA
trusted.afr.stress-volume-client-17=0sAAAAAAAAAAAAAAAA
trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.glusterfs.dht=0sAAAAAQAAAAB////+lVVVUg==
trusted.glusterfs.test="working\000"

[root at dsdb3 ~]# ls /data2/gluster/12657/372657
/data2/gluster/12657/372657
[root at dsdb3 ~]# getfattr -dm - /data2/gluster
getfattr: Removing leading '/' from absolute path names
# file: data2/gluster
trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.glusterfs.dht=0sAAAAAQAAAACVVVVTqqqqpw==
trusted.glusterfs.test="working\000"

[root at dsdb4 ~]# ls /data2/gluster/12657/372657
/data2/gluster/12657/372657
[root at dsdb4 ~]# getfattr -dm - /data2/gluster
getfattr: Removing leading '/' from absolute path names
# file: data2/gluster
trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ==
trusted.glusterfs.dht=0sAAAAAQAAAACVVVVTqqqqpw==
trusted.glusterfs.test="working\000"

On Fri, Apr 29, 2011 at 11:34 AM, Pranith Kumar. Karampuri
<pranithk at gluster.com> wrote:
> Please check if outputs of getfattr -d -m "trusted*" on all the brick directories differ.
>
> Pranith.
> ----- Original Message -----
> From: "Mohit Anchlia" <mohitanchlia at gmail.com>
> To: "Pranith Kumar. Karampuri" <pranithk at gluster.com>
> Cc: "Amar Tumballi" <amar at gluster.com>, gluster-users at gluster.org
> Sent: Friday, April 29, 2011 11:37:01 PM
> Subject: Re: Split brain errors
>
> All I do is
> stop volume,
> delete volume,
> remove mount dir,
> create mount dir
> create volume and this happens.
>
> I have also tried stopping glusterd after deleting volume and then
> start before creating volume again. But this consistently happens.
>
> Also, when I add multiple bricks on the same server some of the
> servers don't have all the xattr.
> On Fri, Apr 29, 2011 at 10:25 AM, Pranith Kumar. Karampuri
> <pranithk at gluster.com> wrote:
>> hi Mohit,
>> ? ? ? ? Do you know what exact steps are leading to this problem?.
>>
>> Pranith.
>>
>> ----- Original Message -----
>> From: "Mohit Anchlia" <mohitanchlia at gmail.com>
>> To: "Amar Tumballi" <amar at gluster.com>, gluster-users at gluster.org
>> Sent: Friday, April 29, 2011 9:49:33 PM
>> Subject: Re: Split brain errors
>>
>> Can someone from dev please help reply? Should I open a bug?
>>
>> On Thu, Apr 28, 2011 at 2:17 PM, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
>>> I got some help and fixed these by setting xattr. For eg changed "I"
>>> to "A" using setfattr.
>>>
>>> But now my next question is why did this happen at first place and
>>> what measure needs to be taken so that this doesn't happen? It keeps
>>> happening even if I start clean, stop vol, delete vol, delete
>>> contents, re-create vols. Also, some of the bricks don't have
>>> "stress-volume" attr.
>>>
>>>
>>>
>>> getfattr -dm - /data1/gluster
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: data1/gluster
>>> trusted.afr.stress-volume-client-8=0sAAAAAAIAAAAAAAAA
>>> trusted.afr.stress-volume-client-9=0sAAAAAAAAAAAAAAAA
>>> trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ==
>>> trusted.glusterfs.dht=0sAAAAAQAAAAAqqqqqP////g==
>>> trusted.glusterfs.test="working\000"
>>>
>>>
>>> On Thu, Apr 28, 2011 at 9:24 AM, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
>>>> I create 30K directories in the client mountpoint. But I've done this
>>>> test with mkfs -I 256 and with default 128 byte (Red hat 5.6). Only
>>>> when I create mkfs -I 256 I see these errors. Looks like the reason
>>>> for the failure because otherwise everything else is same. Same no of
>>>> bricks, servers, user (root) etc.
>>>>
>>>> I run the stress test and client mount logs are full with these errors
>>>> for every subvolume. Looks like it's happening for every file that's
>>>> being writen
>>>>
>>>> On Thu, Apr 28, 2011 at 9:20 AM, Amar Tumballi <amar at gluster.com> wrote:
>>>>> I am seeing the directory size to be different here. Let me confirm if we
>>>>> are checking extra for size to be same also (for directories it will not be
>>>>> needed). In that case, this log makes sense, but surely that is a false
>>>>> positive.
>>>>> -Amar
>>>>>
>>>>> On Thu, Apr 28, 2011 at 9:44 PM, Mohit Anchlia <mohitanchlia at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Yes they are the same. It looks like this problem appears only when I
>>>>>> use -I 256 when creating mkfs. Why would that be?
>>>>>>
>>>>>> [root at dsdb1 ~]# ls -ltr /data/
>>>>>> total 5128
>>>>>> drwx------ ? ? 2 root root ? 16384 Apr 27 16:57 lost+found
>>>>>> drwxr-xr-x 30003 root root 4562944 Apr 27 17:15 mnt-stress
>>>>>> drwxr-xr-x 30003 root root ?598016 Apr 27 17:15 gluster
>>>>>> [root at dsdb1 ~]# ls -ltr /data1/
>>>>>> total 572
>>>>>> drwx------ ? ? 2 root root ?16384 Apr 27 16:59 lost+found
>>>>>> drwxr-xr-x 30003 root root 561152 Apr 27 17:15 gluster
>>>>>>
>>>>>> [root at dsdb2 ~]# ls -ltr /data
>>>>>> total 588
>>>>>> drwx------ ? ? 2 root root ?16384 Apr 27 16:52 lost+found
>>>>>> drwxr-xr-x ? ? 2 root root ? 4096 Apr 27 17:09 mnt-stress
>>>>>> drwxr-xr-x 30003 root root 573440 Apr 27 17:15 gluster
>>>>>> [root at dsdb2 ~]# ls -ltr /data1
>>>>>> total 592
>>>>>> drwx------ ? ? 2 root root ?16384 Apr 27 16:54 lost+found
>>>>>> drwxr-xr-x 30003 root root 581632 Apr 27 17:15 gluster
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 27, 2011 at 11:18 PM, Amar Tumballi <amar at gluster.com> wrote:
>>>>>> >>
>>>>>> >> [2011-04-27 17:11:29.13142] E
>>>>>> >> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>>>>> >> 0-stress-volume-replicate-0: Unable to self-heal permissions/ownership
>>>>>> >> of '/' (possible split-brain). Please fix the file on all backend
>>>>>> >> volumes
>>>>>> >>
>>>>>> >> Can someone please help me reason for this problem?
>>>>>> >>
>>>>>> >> ?gluster volume info all
>>>>>> >>
>>>>>> >> Volume Name: stress-volume
>>>>>> >> Type: Distributed-Replicate
>>>>>> >> Status: Started
>>>>>> >> Number of Bricks: 8 x 2 = 16
>>>>>> >> Transport-type: tcp
>>>>>> >> Bricks:
>>>>>> >> Brick1: dsdb1:/data/gluster
>>>>>> >> Brick2: dsdb2:/data/gluster
>>>>>> >
>>>>>> > Did you check the permission/ownership of these exports? Please make
>>>>>> > sure
>>>>>> > that they are same.
>>>>>> > Regards,
>>>>>> > Amar
>>>>>> >
>>>>>> >>
>>>>>> >> Brick3: dsdb3:/data/gluster
>>>>>> >> Brick4: dsdb4:/data/gluster
>>>>>> >> Brick5: dsdb5:/data/gluster
>>>>>> >> Brick6: dsdb6:/data/gluster
>>>>>> >> Brick7: dslg1:/data/gluster
>>>>>> >> Brick8: dslg2:/data/gluster
>>>>>> >> Brick9: dsdb1:/data1/gluster
>>>>>> >> Brick10: dsdb2:/data1/gluster
>>>>>> >> Brick11: dsdb3:/data1/gluster
>>>>>> >> Brick12: dsdb4:/data1/gluster
>>>>>> >> Brick13: dsdb5:/data1/gluster
>>>>>> >> Brick14: dsdb6:/data1/gluster
>>>>>> >> Brick15: dslg1:/data1/gluster
>>>>>> >> Brick16: dslg2:/data1/gluster
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>
>>>>>
>>>>
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>