As in do you see all the files in those dirs unlike others? On Thu, May 19, 2011 at 12:42 PM, Burnash, James <jburnash at knight.com> wrote: > "Good ones" in what way? > > Permissions on the backend storage are here: > > http://pastebin.com/EiMvbgdh > > -----Original Message----- > From: Mohit Anchlia [mailto:mohitanchlia at gmail.com] > Sent: Thursday, May 19, 2011 3:09 PM > To: Burnash, James > Cc: gluster-users at gluster.org > Subject: Re: Files present on the backend but have become invisible from clients > > It looks like a bug. You are missing xattrs. Can you confirm if all dirs that have "0sAAAAAAAAAAAAAAAA" in your pastebin are good ones? > > On Thu, May 19, 2011 at 11:51 AM, Burnash, James <jburnash at knight.com> wrote: >> Hi Mohit. >> >> Answers inline below: >> >> -----Original Message----- >> From: Mohit Anchlia [mailto:mohitanchlia at gmail.com] >> Sent: Thursday, May 19, 2011 1:17 PM >> To: Burnash, James >> Cc: gluster-users at gluster.org >> Subject: Re: Files present on the backend but have >> become invisible from clients >> >> Can you post the output of ?getfattr -dm - <file|dir> for all parent dirs. >> ? ? ? ?http://pastebin.com/EVfRsSrD >> >> ?and for one of the files from the server? >> >> # ?getfattr -dm - >> /export/read-only/g01/online_archive/2011/01/05/20110105.SN.grep.gz >> getfattr: Removing leading '/' from absolute path names # file: >> export/read-only/g01/online_archive/2011/01/05/20110105.SN.grep.gz >> trusted.afr.pfs-ro1-client-0=0sAAAAAAAAAAAAAAAA >> trusted.afr.pfs-ro1-client-1=0sAAAAAAAAAAAAAAAA >> trusted.gfid=0sjyq/BEwuRhaVbF7qdo0lqA== >> >> Thank you sir! >> >> James >> >> >> On Thu, May 19, 2011 at 8:15 AM, Burnash, James <jburnash at knight.com> wrote: >>> Hello folks. A new conundrum to make sure that my life with GlusterFS >>> doesn't become boring :-) >>> >>> Configuration at end of this message: >>> >>> On client - directory appears to be empty: >>> # ls -l /pfs2/online_archive/2011/01 >>> total 0 >>> >>> fgrep -C 2 inode /var/log/glusterfs/pfs2.log | tail -10 >>> [2011-05-18 14:40:11.665045] E >>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] >>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of >>> '/' (possib le split-brain). Please fix the file on all backend >>> volumes >>> [2011-05-18 14:43:47.810045] E [rpc-clnt.c:199:call_bail] >>> 0-pfs-ro1-client-1: bailing out frame type(GlusterFS 3.1) >>> op(INODELK(29)) xid = 0x130824x sent = 2011-0 >>> 5-18 14:13:45.978987. timeout = 1800 >>> [2011-05-18 14:53:12.311323] E [afr-common.c:110:afr_set_split_brain] >>> 0-pfs-ro1-replicate-0: invalid argument: inode >>> [2011-05-18 15:00:32.240373] E >>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] >>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of >>> '/' (possib le split-brain). Please fix the file on all backend >>> volumes >>> [2011-05-18 15:10:12.282848] E >>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] >>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of >>> '/' (possib le split-brain). Please fix the file on all backend >>> volumes >>> -- >>> [2011-05-19 10:10:25.967246] E >>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] >>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of >>> '/' (possib le split-brain). Please fix the file on all backend >>> volumes >>> [2011-05-19 10:20:18.551953] E >>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] >>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of >>> '/' (possib le split-brain). Please fix the file on all backend >>> volumes >>> [2011-05-19 10:29:34.834256] E [afr-common.c:110:afr_set_split_brain] >>> 0-pfs-ro1-replicate-0: invalid argument: inode >>> [2011-05-19 10:30:06.898152] E >>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] >>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of >>> '/' (possib le split-brain). Please fix the file on all backend >>> volumes >>> [2011-05-19 10:32:05.258799] E >>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] >>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of >>> '/' (possib le split-brain). Please fix the file on all backend >>> volumes >>> >>> >>> On server - directory is populated: >>> loop_check ' ls -l /export/read-only/g*/online_archive/2011/01' >>> jc1letgfs{14,15,17,18} | less >>> jc1letgfs14 >>> /export/read-only/g01/online_archive/2011/01: >>> total 80 >>> drwxrwxrwt 3 ? ?403 1009 4096 May ?4 10:35 03 drwxrwxrwt 3 107421 >>> 1009 >>> 4096 May ?7 12:18 04 drwxrwxrwt 3 107421 1009 4096 May ?4 10:35 05 >>> drwxrwxrwt 3 107421 1009 4096 May ?4 10:36 06 drwxrwxrwt 3 107421 >>> 1009 >>> 4096 May ?4 10:36 07 drwxrwxrwt 3 107421 1009 4096 May ?4 10:41 10 >>> drwxrwxrwt 3 107421 1009 4096 May ?4 10:37 11 drwxrwxrwt 3 107421 >>> 1009 >>> 4096 May ?4 10:43 12 drwxrwxrwt 3 107421 1009 4096 May ?4 10:43 13 >>> drwxrwxrwt 3 107421 1009 4096 May ?4 10:44 14 drwxrwxrwt 3 107421 >>> 1009 >>> 4096 May ?4 10:46 18 drwxrwxrwt 3 107421 1009 4096 Apr 14 14:11 19 >>> drwxrwxrwt 3 107421 1009 4096 May ?4 10:43 20 drwxrwxrwt 3 107421 >>> 1009 >>> 4096 May ?4 10:49 21 drwxrwxrwt 3 107421 1009 4096 May ?4 10:45 24 >>> drwxrwxrwt 3 107421 1009 4096 May ?4 10:47 25 drwxrwxrwt 3 107421 >>> 1009 >>> 4096 May ?4 10:52 26 drwxrwxrwt 3 107421 1009 4096 May ?4 10:49 27 >>> drwxrwxrwt 3 107421 1009 4096 May ?4 10:50 28 drwxrwxrwt 3 107421 >>> 1009 >>> 4096 May ?4 10:56 31 >>> >>> (and shows on every brick the same) >>> >>> And from the server logs: >>> root at jc1letgfs17:/var/log/glusterfs# fgrep '2011-05-19 10:39:30' >>> bricks/export-read-only-g*.log >>> [2011-05-19 10:39:30.306661] E [posix.c:438:posix_lookup] >>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data >>> available >>> [2011-05-19 10:39:30.307754] E [posix.c:438:posix_lookup] >>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data >>> available >>> [2011-05-19 10:39:30.308230] E [posix.c:438:posix_lookup] >>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data >>> available >>> [2011-05-19 10:39:30.322342] E [posix.c:438:posix_lookup] >>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data >>> available >>> [2011-05-19 10:39:30.421298] E [posix.c:438:posix_lookup] >>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data >>> available >>> >>> The only two things that jump out so far are: >>> ?the permissions on the directories under /export/read-only/g01/online_archive/2011/01 are 7777, whereas on the directories under /export/read-only/g01/online_archive/2010/01 are just 755. >>> The lstat "no data available errors" only see to appear on the problem directories. >>> >>> ?Any hints or suggestions would be greatly appreciated. Thanks, James >>> >>> >>> Config: >>> All on Gluster 3.1.3 >>> Servers: >>> 4 CentOS 5.5 (ProLiant DL370 G6 servers, Intel Xeon 3200 MHz), Each >>> with: >>> Single P812 Smart Array Controller, >>> Single MDS600 with 70 2TB SATA drives configured as RAID 50 >>> 48 MB RAM >>> >>> Clients: >>> 185 CentOS 5.2 (mostly DL360 G6). >>> /pfs2 is the mount point for a Duplicated-Replicate volume of 4 servers. >>> >>> Volume Name: pfs-ro1 >>> Type: Distributed-Replicate >>> Status: Started >>> Number of Bricks: 20 x 2 = 40 >>> Transport-type: tcp >>> Bricks: >>> Brick1: jc1letgfs17-pfs1:/export/read-only/g01 >>> Brick2: jc1letgfs18-pfs1:/export/read-only/g01 >>> Brick3: jc1letgfs17-pfs1:/export/read-only/g02 >>> Brick4: jc1letgfs18-pfs1:/export/read-only/g02 >>> Brick5: jc1letgfs17-pfs1:/export/read-only/g03 >>> Brick6: jc1letgfs18-pfs1:/export/read-only/g03 >>> Brick7: jc1letgfs17-pfs1:/export/read-only/g04 >>> Brick8: jc1letgfs18-pfs1:/export/read-only/g04 >>> Brick9: jc1letgfs17-pfs1:/export/read-only/g05 >>> Brick10: jc1letgfs18-pfs1:/export/read-only/g05 >>> Brick11: jc1letgfs17-pfs1:/export/read-only/g06 >>> Brick12: jc1letgfs18-pfs1:/export/read-only/g06 >>> Brick13: jc1letgfs17-pfs1:/export/read-only/g07 >>> Brick14: jc1letgfs18-pfs1:/export/read-only/g07 >>> Brick15: jc1letgfs17-pfs1:/export/read-only/g08 >>> Brick16: jc1letgfs18-pfs1:/export/read-only/g08 >>> Brick17: jc1letgfs17-pfs1:/export/read-only/g09 >>> Brick18: jc1letgfs18-pfs1:/export/read-only/g09 >>> Brick19: jc1letgfs17-pfs1:/export/read-only/g10 >>> Brick20: jc1letgfs18-pfs1:/export/read-only/g10 >>> Brick21: jc1letgfs14-pfs1:/export/read-only/g01 >>> Brick22: jc1letgfs15-pfs1:/export/read-only/g01 >>> Brick23: jc1letgfs14-pfs1:/export/read-only/g02 >>> Brick24: jc1letgfs15-pfs1:/export/read-only/g02 >>> Brick25: jc1letgfs14-pfs1:/export/read-only/g03 >>> Brick26: jc1letgfs15-pfs1:/export/read-only/g03 >>> Brick27: jc1letgfs14-pfs1:/export/read-only/g04 >>> Brick28: jc1letgfs15-pfs1:/export/read-only/g04 >>> Brick29: jc1letgfs14-pfs1:/export/read-only/g05 >>> Brick30: jc1letgfs15-pfs1:/export/read-only/g05 >>> Brick11: jc1letgfs17-pfs1:/export/read-only/g06 >>> Brick12: jc1letgfs18-pfs1:/export/read-only/g06 >>> Brick13: jc1letgfs17-pfs1:/export/read-only/g07 >>> Brick14: jc1letgfs18-pfs1:/export/read-only/g07 >>> Brick15: jc1letgfs17-pfs1:/export/read-only/g08 >>> Brick16: jc1letgfs18-pfs1:/export/read-only/g08 >>> Brick17: jc1letgfs17-pfs1:/export/read-only/g09 >>> Brick18: jc1letgfs18-pfs1:/export/read-only/g09 >>> Brick19: jc1letgfs17-pfs1:/export/read-only/g10 >>> Brick20: jc1letgfs18-pfs1:/export/read-only/g10 >>> Brick21: jc1letgfs14-pfs1:/export/read-only/g01 >>> Brick22: jc1letgfs15-pfs1:/export/read-only/g01 >>> Brick23: jc1letgfs14-pfs1:/export/read-only/g02 >>> Brick24: jc1letgfs15-pfs1:/export/read-only/g02 >>> Brick25: jc1letgfs14-pfs1:/export/read-only/g03 >>> Brick26: jc1letgfs15-pfs1:/export/read-only/g03 >>> Brick27: jc1letgfs14-pfs1:/export/read-only/g04 >>> Brick28: jc1letgfs15-pfs1:/export/read-only/g04 >>> Brick29: jc1letgfs14-pfs1:/export/read-only/g05 >>> Brick30: jc1letgfs15-pfs1:/export/read-only/g05 >>> Brick31: jc1letgfs14-pfs1:/export/read-only/g06 >>> Brick32: jc1letgfs15-pfs1:/export/read-only/g06 >>> Brick33: jc1letgfs14-pfs1:/export/read-only/g07 >>> Brick34: jc1letgfs15-pfs1:/export/read-only/g07 >>> Brick35: jc1letgfs14-pfs1:/export/read-only/g08 >>> Brick36: jc1letgfs15-pfs1:/export/read-only/g08 >>> Brick37: jc1letgfs14-pfs1:/export/read-only/g09 >>> Brick38: jc1letgfs15-pfs1:/export/read-only/g09 >>> Brick39: jc1letgfs14-pfs1:/export/read-only/g10 >>> Brick40: jc1letgfs15-pfs1:/export/read-only/g10 >>> Options Reconfigured: >>> diagnostics.brick-log-level: ERROR >>> cluster.metadata-change-log: on >>> diagnostics.client-log-level: ERROR >>> performance.stat-prefetch: on >>> performance.cache-size: 2GB >>> network.ping-timeout: 10 >>> >>> >>> DISCLAIMER: >>> This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. >>> NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group >>> may, at its discretion, monitor and review the content of all e-mail >>> communications. http://www.knight.com >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>> >> >