I was really hoping to get some suggestions on this - but I know everyone is equally busy. It's looking kind of grim for my GlusterFS project - the users all moved off after the last outage - and I'm open to ideas on how to bring them back. James Burnash Unix Engineer Knight Capital Group -----Original Message----- From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James Sent: Tuesday, June 21, 2011 8:53 AM To: gluster-users at gluster.org Subject: Re: Files present on the backend but have become invisible from clients Just following up on this problem. Here are the trusted.afr.* attributes for the problem directory: jc1ladmin1:~/myscripts$ ./check_brick_attrs_ro -e -s -a trusted.afr -b 'g0{1,2}/online_archive/2011' jc1letgfs{14,15,17,18} g01/online_archive/2011 trusted.afr.pfs-ro1-client-0=0x000000000000000000000000 jc1letgfs17 g01/online_archive/2011 trusted.afr.pfs-ro1-client-0=0x000000000000000000000000 jc1letgfs18 g01/online_archive/2011 trusted.afr.pfs-ro1-client-20=0x000000000000000000000000 jc1letgfs14 g01/online_archive/2011 trusted.afr.pfs-ro1-client-20=0x000000000000000000000000 jc1letgfs15 g02/online_archive/2011 trusted.afr.pfs-ro1-client-2=0x000000000000000000000000 jc1letgfs17 g02/online_archive/2011 trusted.afr.pfs-ro1-client-2=0x000000000000000000000000 jc1letgfs18 g02/online_archive/2011 trusted.afr.pfs-ro1-client-22=0x000000000000000000000000 jc1letgfs14 g02/online_archive/2011 trusted.afr.pfs-ro1-client-22=0x000000000000000000000000 jc1letgfs15 here are the same attributes for the g01 and g02 bricks themselves: jc1ladmin1:~/myscripts$ ./check_brick_attrs_ro -e -s -a trusted.afr -b 'g0{1,2}' jc1letgfs{14,15,17,18} g01 trusted.afr.pfs-ro1-client-0=0x000000000000000000000000 jc1letgfs17 g01 trusted.afr.pfs-ro1-client-0=0x000000000600000800000000 jc1letgfs18 g01 trusted.afr.pfs-ro1-client-20=0x000000000000000000000000 jc1letgfs14 g01 trusted.afr.pfs-ro1-client-20=0x000000000200000000000000 jc1letgfs15 g02 trusted.afr.pfs-ro1-client-2=0x000000000000000000000000 jc1letgfs17 g02 trusted.afr.pfs-ro1-client-2=0x000000004500000400000000 jc1letgfs18 g02 trusted.afr.pfs-ro1-client-22=0x000000000000000000000000 jc1letgfs14 g02 trusted.afr.pfs-ro1-client-22=0x000000000200000000000000 jc1letgfs15 Would anybody have any insights as to what is going on here? I'm seeing attributes in my sleep these days ... that cannot be good! Thanks, James Burnash Unix Engineer Knight Capital Group -----Original Message----- From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James Sent: Friday, June 17, 2011 12:01 PM To: 'Pranith Kumar. Karampuri'; Jeff Darcy (jdarcy at redhat.com); gluster-users at gluster.org Subject: Re: Files present on the backend but have become invisible from clients HI Pranith. It's been extraordinarily busy here - sorry to take so long to respond. Here are the dir permissions: jc1ladmin1:~/projects/gluster$ loop_check ' ls -ld /export/read-only/g0[12] 2>/dev/null' jc1letgfs{14,15} jc1letgfs14 drwxr-xr-x 7 root root 95 Apr 18 14:52 /export/read-only/g01 drwxr-xr-x 7 root root 95 May 2 15:26 /export/read-only/g02 jc1letgfs15 drwxr-xr-x 7 root root 95 Apr 18 14:52 /export/read-only/g01 drwxr-xr-x 7 root root 95 May 2 15:26 /export/read-only/g02 As for your second request, are you asking for the extended attributes for every file found under those bricks? There are some 35k files on each pair of mirrored bricks ... perhaps you can tell me what type of attribute you are looking for and I can filter those out? Thanks, James Burnash Unix Engineer Knight Capital Group -----Original Message----- From: Pranith Kumar. Karampuri [mailto:pranithk at gluster.com] Sent: Wednesday, June 15, 2011 1:54 AM To: Burnash, James; Jeff Darcy (jdarcy at redhat.com); gluster-users at gluster.org Subject: RE: Files present on the backend but have become invisible from clients hi James, Could you please check if any of the file permissions of files in the directory are mis-matching, I also need the output of "getxattr -d -m . <filename>" for all the files in the following bricks in that order: jc1letgfs14:export/read-only/g01 jc1letgfs15:export/read-only/g01 jc1letgfs14:export/read-only/g02 jc1letgfs15:export/read-only/g02 Please give the ls command output on the mount point so that we can check what files are missing. Thanks Pranith ________________________________________ From: Burnash, James [jburnash at knight.com] Sent: Tuesday, June 14, 2011 5:37 PM To: Pranith Kumar. Karampuri; Jeff Darcy (jdarcy at redhat.com); gluster-users at gluster.org Subject: RE: Files present on the backend but have become invisible from clients Hi Pranith. Yes, I do see those messages in my mount logs on the client: root at jc1lnxsamm100:~# fgrep afr-self-heal /var/log/glusterfs/pfs2.log | tail [2011-06-14 07:30:56.152066] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:35:16.869848] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:39:48.500117] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:40:19.312364] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:44:27.714292] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:50:04.691154] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:54:17.853591] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:55:26.876415] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:59:51.702585] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 08:00:08.346056] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes James Burnash Unix Engineer Knight Capital Group -----Original Message----- From: Pranith Kumar. Karampuri [mailto:pranithk at gluster.com] Sent: Tuesday, June 14, 2011 1:28 AM To: Burnash, James; Jeff Darcy (jdarcy at redhat.com); gluster-users at gluster.org Subject: RE: Files present on the backend but have become invisible from clients hi James, bricks 3-10 dont have problems, I think brick 01, 02 went to split brain situation, could you confirm if you see the following logs in your mount's log file [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]0-stress-volume-replicate-0: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes. Pranith. ________________________________________ From: Burnash, James [jburnash at knight.com] Sent: Monday, June 13, 2011 11:56 PM To: Pranith Kumar. Karampuri; Jeff Darcy (jdarcy at redhat.com); gluster-users at gluster.org Subject: RE: Files present on the backend but have become invisible from clients Hi Pranith. Here is the revised listing - please notice that bricks g01 and g02 on the two servers (jc1letgfs14 and 15) have what appear to be "normal" trusted.afr attributes, but the balance of the bricks (3-10) all have =0x000000000000000000000000. http://pastebin.com/j0hVFTzd Is this right, or am I looking at this backwards / sideways? James Burnash Unix Engineer Knight Capital Group -----Original Message----- From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James Sent: Monday, June 13, 2011 8:28 AM To: 'Pranith Kumar. Karampuri'; Jeff Darcy (jdarcy at redhat.com); gluster-users at gluster.org Subject: Re: Files present on the backend but have become invisible from clients Hi Pranith. Sorry - last week was a rough one. Disregard that pastebin - I will put up a new one that makes more sense and repost to the list. James -----Original Message----- From: Pranith Kumar. Karampuri [mailto:pranithk at gluster.com] Sent: Monday, June 13, 2011 1:12 AM To: Burnash, James; Jeff Darcy (jdarcy at redhat.com); gluster-users at gluster.org Subject: RE: Files present on the backend but have become invisible from clients hi James, I looked at the pastebin sample, I see that all of the attrs are complete zeros, Could you let me know what is it that I am missing. Pranith ________________________________________ From: gluster-users-bounces at gluster.org [gluster-users-bounces at gluster.org] on behalf of Burnash, James [jburnash at knight.com] Sent: Saturday, June 11, 2011 12:57 AM To: Jeff Darcy (jdarcy at redhat.com); gluster-users at gluster.org Subject: Re: Files present on the backend but have become invisible from clients Hi Jeff and Gluster users. Question about inconsistent looking attributes on brick directories on my Gluster backend servers. http://pastebin.com/b964zMu8 What stands out here is that the two original servers (jc1letgfs17 and 18) only show attributes of 0x000000000000000000000000 for every other pfs-ro1-client-X, while the two servers that were added some time ago show a different pattern (as can be seen in the pastebin sample). Configuration at end of this message: Any hints or suggestions would be greatly appreciated. Thanks, James Config: All on Gluster 3.1.3 Servers: 4 CentOS 5.5 (ProLiant DL370 G6 servers, Intel Xeon 3200 MHz), Each with: Single P812 Smart Array Controller, Single MDS600 with 70 2TB SATA drives configured as RAID 50 48 MB RAM Clients: 185 CentOS 5.2 (mostly DL360 G6). /pfs2 is the mount point for a Duplicated-Replicate volume of 4 servers. Volume Name: pfs-ro1 Type: Distributed-Replicate Status: Started Number of Bricks: 20 x 2 = 40 Transport-type: tcp Bricks: Brick1: jc1letgfs17-pfs1:/export/read-only/g01 Brick2: jc1letgfs18-pfs1:/export/read-only/g01 Brick3: jc1letgfs17-pfs1:/export/read-only/g02 Brick4: jc1letgfs18-pfs1:/export/read-only/g02 Brick5: jc1letgfs17-pfs1:/export/read-only/g03 Brick6: jc1letgfs18-pfs1:/export/read-only/g03 Brick7: jc1letgfs17-pfs1:/export/read-only/g04 Brick8: jc1letgfs18-pfs1:/export/read-only/g04 Brick9: jc1letgfs17-pfs1:/export/read-only/g05 Brick10: jc1letgfs18-pfs1:/export/read-only/g05 Brick11: jc1letgfs17-pfs1:/export/read-only/g06 Brick12: jc1letgfs18-pfs1:/export/read-only/g06 Brick13: jc1letgfs17-pfs1:/export/read-only/g07 Brick14: jc1letgfs18-pfs1:/export/read-only/g07 Brick15: jc1letgfs17-pfs1:/export/read-only/g08 Brick16: jc1letgfs18-pfs1:/export/read-only/g08 Brick17: jc1letgfs17-pfs1:/export/read-only/g09 Brick18: jc1letgfs18-pfs1:/export/read-only/g09 Brick19: jc1letgfs17-pfs1:/export/read-only/g10 Brick20: jc1letgfs18-pfs1:/export/read-only/g10 Brick21: jc1letgfs14-pfs1:/export/read-only/g01 Brick22: jc1letgfs15-pfs1:/export/read-only/g01 Brick23: jc1letgfs14-pfs1:/export/read-only/g02 Brick24: jc1letgfs15-pfs1:/export/read-only/g02 Brick25: jc1letgfs14-pfs1:/export/read-only/g03 Brick26: jc1letgfs15-pfs1:/export/read-only/g03 Brick27: jc1letgfs14-pfs1:/export/read-only/g04 Brick28: jc1letgfs15-pfs1:/export/read-only/g04 Brick29: jc1letgfs14-pfs1:/export/read-only/g05 Brick30: jc1letgfs15-pfs1:/export/read-only/g05 Brick11: jc1letgfs17-pfs1:/export/read-only/g06 Brick12: jc1letgfs18-pfs1:/export/read-only/g06 Brick13: jc1letgfs17-pfs1:/export/read-only/g07 Brick14: jc1letgfs18-pfs1:/export/read-only/g07 Brick15: jc1letgfs17-pfs1:/export/read-only/g08 Brick16: jc1letgfs18-pfs1:/export/read-only/g08 Brick17: jc1letgfs17-pfs1:/export/read-only/g09 Brick18: jc1letgfs18-pfs1:/export/read-only/g09 Brick19: jc1letgfs17-pfs1:/export/read-only/g10 Brick20: jc1letgfs18-pfs1:/export/read-only/g10 Brick21: jc1letgfs14-pfs1:/export/read-only/g01 Brick22: jc1letgfs15-pfs1:/export/read-only/g01 Brick23: jc1letgfs14-pfs1:/export/read-only/g02 Brick24: jc1letgfs15-pfs1:/export/read-only/g02 Brick25: jc1letgfs14-pfs1:/export/read-only/g03 Brick26: jc1letgfs15-pfs1:/export/read-only/g03 Brick27: jc1letgfs14-pfs1:/export/read-only/g04 Brick28: jc1letgfs15-pfs1:/export/read-only/g04 Brick29: jc1letgfs14-pfs1:/export/read-only/g05 Brick30: jc1letgfs15-pfs1:/export/read-only/g05 Brick31: jc1letgfs14-pfs1:/export/read-only/g06 Brick32: jc1letgfs15-pfs1:/export/read-only/g06 Brick33: jc1letgfs14-pfs1:/export/read-only/g07 Brick34: jc1letgfs15-pfs1:/export/read-only/g07 Brick35: jc1letgfs14-pfs1:/export/read-only/g08 Brick36: jc1letgfs15-pfs1:/export/read-only/g08 Brick37: jc1letgfs14-pfs1:/export/read-only/g09 Brick38: jc1letgfs15-pfs1:/export/read-only/g09 Brick39: jc1letgfs14-pfs1:/export/read-only/g10 Brick40: jc1letgfs15-pfs1:/export/read-only/g10 Options Reconfigured: diagnostics.brick-log-level: ERROR cluster.metadata-change-log: on diagnostics.client-log-level: ERROR performance.stat-prefetch: on performance.cache-size: 2GB network.ping-timeout: 10 DISCLAIMER: This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users