Split brain errors

mohitanchlia at gmail.com (Mohit Anchlia) · Thu, 28 Apr 2011 09:24:17 -0700

I create 30K directories in the client mountpoint. But I've done this
test with mkfs -I 256 and with default 128 byte (Red hat 5.6). Only
when I create mkfs -I 256 I see these errors. Looks like the reason
for the failure because otherwise everything else is same. Same no of
bricks, servers, user (root) etc.

I run the stress test and client mount logs are full with these errors
for every subvolume. Looks like it's happening for every file that's
being writen

On Thu, Apr 28, 2011 at 9:20 AM, Amar Tumballi <amar at gluster.com> wrote:
> I am seeing the directory size to be different here. Let me confirm if we
> are checking extra for size to be same also (for directories it will not be
> needed). In that case, this log makes sense, but surely that is a false
> positive.
> -Amar
>
> On Thu, Apr 28, 2011 at 9:44 PM, Mohit Anchlia <mohitanchlia at gmail.com>
> wrote:
>>
>> Yes they are the same. It looks like this problem appears only when I
>> use -I 256 when creating mkfs. Why would that be?
>>
>> [root at dsdb1 ~]# ls -ltr /data/
>> total 5128
>> drwx------ ? ? 2 root root ? 16384 Apr 27 16:57 lost+found
>> drwxr-xr-x 30003 root root 4562944 Apr 27 17:15 mnt-stress
>> drwxr-xr-x 30003 root root ?598016 Apr 27 17:15 gluster
>> [root at dsdb1 ~]# ls -ltr /data1/
>> total 572
>> drwx------ ? ? 2 root root ?16384 Apr 27 16:59 lost+found
>> drwxr-xr-x 30003 root root 561152 Apr 27 17:15 gluster
>>
>> [root at dsdb2 ~]# ls -ltr /data
>> total 588
>> drwx------ ? ? 2 root root ?16384 Apr 27 16:52 lost+found
>> drwxr-xr-x ? ? 2 root root ? 4096 Apr 27 17:09 mnt-stress
>> drwxr-xr-x 30003 root root 573440 Apr 27 17:15 gluster
>> [root at dsdb2 ~]# ls -ltr /data1
>> total 592
>> drwx------ ? ? 2 root root ?16384 Apr 27 16:54 lost+found
>> drwxr-xr-x 30003 root root 581632 Apr 27 17:15 gluster
>>
>>
>> On Wed, Apr 27, 2011 at 11:18 PM, Amar Tumballi <amar at gluster.com> wrote:
>> >>
>> >> [2011-04-27 17:11:29.13142] E
>> >> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>> >> 0-stress-volume-replicate-0: Unable to self-heal permissions/ownership
>> >> of '/' (possible split-brain). Please fix the file on all backend
>> >> volumes
>> >>
>> >> Can someone please help me reason for this problem?
>> >>
>> >> ?gluster volume info all
>> >>
>> >> Volume Name: stress-volume
>> >> Type: Distributed-Replicate
>> >> Status: Started
>> >> Number of Bricks: 8 x 2 = 16
>> >> Transport-type: tcp
>> >> Bricks:
>> >> Brick1: dsdb1:/data/gluster
>> >> Brick2: dsdb2:/data/gluster
>> >
>> > Did you check the permission/ownership of these exports? Please make
>> > sure
>> > that they are same.
>> > Regards,
>> > Amar
>> >
>> >>
>> >> Brick3: dsdb3:/data/gluster
>> >> Brick4: dsdb4:/data/gluster
>> >> Brick5: dsdb5:/data/gluster
>> >> Brick6: dsdb6:/data/gluster
>> >> Brick7: dslg1:/data/gluster
>> >> Brick8: dslg2:/data/gluster
>> >> Brick9: dsdb1:/data1/gluster
>> >> Brick10: dsdb2:/data1/gluster
>> >> Brick11: dsdb3:/data1/gluster
>> >> Brick12: dsdb4:/data1/gluster
>> >> Brick13: dsdb5:/data1/gluster
>> >> Brick14: dsdb6:/data1/gluster
>> >> Brick15: dslg1:/data1/gluster
>> >> Brick16: dslg2:/data1/gluster
>> >
>> >
>> >
>
>