Please check if outputs of getfattr -d -m "trusted*" on all the brick directories differ. Pranith. ----- Original Message ----- From: "Mohit Anchlia" <mohitanchlia at gmail.com> To: "Pranith Kumar. Karampuri" <pranithk at gluster.com> Cc: "Amar Tumballi" <amar at gluster.com>, gluster-users at gluster.org Sent: Friday, April 29, 2011 11:37:01 PM Subject: Re: Split brain errors All I do is stop volume, delete volume, remove mount dir, create mount dir create volume and this happens. I have also tried stopping glusterd after deleting volume and then start before creating volume again. But this consistently happens. Also, when I add multiple bricks on the same server some of the servers don't have all the xattr. On Fri, Apr 29, 2011 at 10:25 AM, Pranith Kumar. Karampuri <pranithk at gluster.com> wrote: > hi Mohit, > ? ? ? ? Do you know what exact steps are leading to this problem?. > > Pranith. > > ----- Original Message ----- > From: "Mohit Anchlia" <mohitanchlia at gmail.com> > To: "Amar Tumballi" <amar at gluster.com>, gluster-users at gluster.org > Sent: Friday, April 29, 2011 9:49:33 PM > Subject: Re: Split brain errors > > Can someone from dev please help reply? Should I open a bug? > > On Thu, Apr 28, 2011 at 2:17 PM, Mohit Anchlia <mohitanchlia at gmail.com> wrote: >> I got some help and fixed these by setting xattr. For eg changed "I" >> to "A" using setfattr. >> >> But now my next question is why did this happen at first place and >> what measure needs to be taken so that this doesn't happen? It keeps >> happening even if I start clean, stop vol, delete vol, delete >> contents, re-create vols. Also, some of the bricks don't have >> "stress-volume" attr. >> >> >> >> getfattr -dm - /data1/gluster >> getfattr: Removing leading '/' from absolute path names >> # file: data1/gluster >> trusted.afr.stress-volume-client-8=0sAAAAAAIAAAAAAAAA >> trusted.afr.stress-volume-client-9=0sAAAAAAAAAAAAAAAA >> trusted.gfid=0sAAAAAAAAAAAAAAAAAAAAAQ== >> trusted.glusterfs.dht=0sAAAAAQAAAAAqqqqqP////g== >> trusted.glusterfs.test="working\000" >> >> >> On Thu, Apr 28, 2011 at 9:24 AM, Mohit Anchlia <mohitanchlia at gmail.com> wrote: >>> I create 30K directories in the client mountpoint. But I've done this >>> test with mkfs -I 256 and with default 128 byte (Red hat 5.6). Only >>> when I create mkfs -I 256 I see these errors. Looks like the reason >>> for the failure because otherwise everything else is same. Same no of >>> bricks, servers, user (root) etc. >>> >>> I run the stress test and client mount logs are full with these errors >>> for every subvolume. Looks like it's happening for every file that's >>> being writen >>> >>> On Thu, Apr 28, 2011 at 9:20 AM, Amar Tumballi <amar at gluster.com> wrote: >>>> I am seeing the directory size to be different here. Let me confirm if we >>>> are checking extra for size to be same also (for directories it will not be >>>> needed). In that case, this log makes sense, but surely that is a false >>>> positive. >>>> -Amar >>>> >>>> On Thu, Apr 28, 2011 at 9:44 PM, Mohit Anchlia <mohitanchlia at gmail.com> >>>> wrote: >>>>> >>>>> Yes they are the same. It looks like this problem appears only when I >>>>> use -I 256 when creating mkfs. Why would that be? >>>>> >>>>> [root at dsdb1 ~]# ls -ltr /data/ >>>>> total 5128 >>>>> drwx------ ? ? 2 root root ? 16384 Apr 27 16:57 lost+found >>>>> drwxr-xr-x 30003 root root 4562944 Apr 27 17:15 mnt-stress >>>>> drwxr-xr-x 30003 root root ?598016 Apr 27 17:15 gluster >>>>> [root at dsdb1 ~]# ls -ltr /data1/ >>>>> total 572 >>>>> drwx------ ? ? 2 root root ?16384 Apr 27 16:59 lost+found >>>>> drwxr-xr-x 30003 root root 561152 Apr 27 17:15 gluster >>>>> >>>>> [root at dsdb2 ~]# ls -ltr /data >>>>> total 588 >>>>> drwx------ ? ? 2 root root ?16384 Apr 27 16:52 lost+found >>>>> drwxr-xr-x ? ? 2 root root ? 4096 Apr 27 17:09 mnt-stress >>>>> drwxr-xr-x 30003 root root 573440 Apr 27 17:15 gluster >>>>> [root at dsdb2 ~]# ls -ltr /data1 >>>>> total 592 >>>>> drwx------ ? ? 2 root root ?16384 Apr 27 16:54 lost+found >>>>> drwxr-xr-x 30003 root root 581632 Apr 27 17:15 gluster >>>>> >>>>> >>>>> On Wed, Apr 27, 2011 at 11:18 PM, Amar Tumballi <amar at gluster.com> wrote: >>>>> >> >>>>> >> [2011-04-27 17:11:29.13142] E >>>>> >> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] >>>>> >> 0-stress-volume-replicate-0: Unable to self-heal permissions/ownership >>>>> >> of '/' (possible split-brain). Please fix the file on all backend >>>>> >> volumes >>>>> >> >>>>> >> Can someone please help me reason for this problem? >>>>> >> >>>>> >> ?gluster volume info all >>>>> >> >>>>> >> Volume Name: stress-volume >>>>> >> Type: Distributed-Replicate >>>>> >> Status: Started >>>>> >> Number of Bricks: 8 x 2 = 16 >>>>> >> Transport-type: tcp >>>>> >> Bricks: >>>>> >> Brick1: dsdb1:/data/gluster >>>>> >> Brick2: dsdb2:/data/gluster >>>>> > >>>>> > Did you check the permission/ownership of these exports? Please make >>>>> > sure >>>>> > that they are same. >>>>> > Regards, >>>>> > Amar >>>>> > >>>>> >> >>>>> >> Brick3: dsdb3:/data/gluster >>>>> >> Brick4: dsdb4:/data/gluster >>>>> >> Brick5: dsdb5:/data/gluster >>>>> >> Brick6: dsdb6:/data/gluster >>>>> >> Brick7: dslg1:/data/gluster >>>>> >> Brick8: dslg2:/data/gluster >>>>> >> Brick9: dsdb1:/data1/gluster >>>>> >> Brick10: dsdb2:/data1/gluster >>>>> >> Brick11: dsdb3:/data1/gluster >>>>> >> Brick12: dsdb4:/data1/gluster >>>>> >> Brick13: dsdb5:/data1/gluster >>>>> >> Brick14: dsdb6:/data1/gluster >>>>> >> Brick15: dslg1:/data1/gluster >>>>> >> Brick16: dslg2:/data1/gluster >>>>> > >>>>> > >>>>> > >>>> >>>> >>> >> > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >