Files present on the backend but have become invisible from clients

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I was really hoping to get some suggestions on this - but I know everyone is equally busy.

It's looking kind of grim for my GlusterFS project - the users all moved off after the last outage - and I'm open to ideas on how to bring them back.

James Burnash
Unix Engineer
Knight Capital Group


-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James
Sent: Tuesday, June 21, 2011 8:53 AM
To: gluster-users at gluster.org
Subject: Re: Files present on the backend but have become invisible from clients

Just following up on this problem.

Here are the trusted.afr.* attributes for the problem directory:

jc1ladmin1:~/myscripts$ ./check_brick_attrs_ro -e -s -a trusted.afr -b 'g0{1,2}/online_archive/2011' jc1letgfs{14,15,17,18}
g01/online_archive/2011 trusted.afr.pfs-ro1-client-0=0x000000000000000000000000 jc1letgfs17
g01/online_archive/2011 trusted.afr.pfs-ro1-client-0=0x000000000000000000000000 jc1letgfs18
g01/online_archive/2011 trusted.afr.pfs-ro1-client-20=0x000000000000000000000000 jc1letgfs14
g01/online_archive/2011 trusted.afr.pfs-ro1-client-20=0x000000000000000000000000 jc1letgfs15
g02/online_archive/2011 trusted.afr.pfs-ro1-client-2=0x000000000000000000000000 jc1letgfs17
g02/online_archive/2011 trusted.afr.pfs-ro1-client-2=0x000000000000000000000000 jc1letgfs18
g02/online_archive/2011 trusted.afr.pfs-ro1-client-22=0x000000000000000000000000 jc1letgfs14
g02/online_archive/2011 trusted.afr.pfs-ro1-client-22=0x000000000000000000000000 jc1letgfs15

here are the same attributes for the g01 and g02 bricks themselves:

jc1ladmin1:~/myscripts$ ./check_brick_attrs_ro -e -s -a trusted.afr -b 'g0{1,2}' jc1letgfs{14,15,17,18}
g01 trusted.afr.pfs-ro1-client-0=0x000000000000000000000000 jc1letgfs17
g01 trusted.afr.pfs-ro1-client-0=0x000000000600000800000000 jc1letgfs18
g01 trusted.afr.pfs-ro1-client-20=0x000000000000000000000000 jc1letgfs14
g01 trusted.afr.pfs-ro1-client-20=0x000000000200000000000000 jc1letgfs15
g02 trusted.afr.pfs-ro1-client-2=0x000000000000000000000000 jc1letgfs17
g02 trusted.afr.pfs-ro1-client-2=0x000000004500000400000000 jc1letgfs18
g02 trusted.afr.pfs-ro1-client-22=0x000000000000000000000000 jc1letgfs14
g02 trusted.afr.pfs-ro1-client-22=0x000000000200000000000000 jc1letgfs15

Would anybody have any insights as to what is going on here? I'm seeing attributes in my sleep these days ... that cannot be good!

Thanks,

James Burnash
Unix Engineer
Knight Capital Group


-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James
Sent: Friday, June 17, 2011 12:01 PM
To: 'Pranith Kumar. Karampuri'; Jeff Darcy (jdarcy at redhat.com); gluster-users at gluster.org
Subject: Re: Files present on the backend but have become invisible from clients

HI Pranith.

It's been extraordinarily busy here - sorry to take so long to respond.

Here are the dir permissions:

jc1ladmin1:~/projects/gluster$ loop_check ' ls -ld /export/read-only/g0[12] 2>/dev/null' jc1letgfs{14,15}
 jc1letgfs14
drwxr-xr-x 7 root root 95 Apr 18 14:52 /export/read-only/g01 drwxr-xr-x 7 root root 95 May  2 15:26 /export/read-only/g02

jc1letgfs15
drwxr-xr-x 7 root root 95 Apr 18 14:52 /export/read-only/g01 drwxr-xr-x 7 root root 95 May  2 15:26 /export/read-only/g02

As for your second request, are you asking for the extended attributes for every file found under those bricks? There are some 35k files on each pair of mirrored bricks ... perhaps you can tell me what type of attribute you are looking for and I can filter those out?

Thanks,

James Burnash
Unix Engineer
Knight Capital Group


-----Original Message-----
From: Pranith Kumar. Karampuri [mailto:pranithk at gluster.com]
Sent: Wednesday, June 15, 2011 1:54 AM
To: Burnash, James; Jeff Darcy (jdarcy at redhat.com); gluster-users at gluster.org
Subject: RE: Files present on the backend but have become invisible from clients

hi James,
       Could you please check if any of the file permissions of files in the directory are mis-matching, I also need the output of "getxattr -d -m . <filename>" for all the files in the following bricks in that order:

jc1letgfs14:export/read-only/g01
jc1letgfs15:export/read-only/g01

jc1letgfs14:export/read-only/g02
jc1letgfs15:export/read-only/g02

Please give the ls command output on the mount point so that we can check what files are missing.

Thanks
Pranith
________________________________________
From: Burnash, James [jburnash at knight.com]
Sent: Tuesday, June 14, 2011 5:37 PM
To: Pranith Kumar. Karampuri; Jeff Darcy        (jdarcy at redhat.com); gluster-users at gluster.org
Subject: RE: Files present on the backend but   have    become  invisible from clients

Hi Pranith.

Yes, I do see those messages in my mount logs on the client:

root at jc1lnxsamm100:~# fgrep afr-self-heal /var/log/glusterfs/pfs2.log | tail
[2011-06-14 07:30:56.152066] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2011-06-14 07:35:16.869848] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2011-06-14 07:39:48.500117] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2011-06-14 07:40:19.312364] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2011-06-14 07:44:27.714292] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2011-06-14 07:50:04.691154] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2011-06-14 07:54:17.853591] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2011-06-14 07:55:26.876415] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2011-06-14 07:59:51.702585] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2011-06-14 08:00:08.346056] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes

James Burnash
Unix Engineer
Knight Capital Group


-----Original Message-----
From: Pranith Kumar. Karampuri [mailto:pranithk at gluster.com]
Sent: Tuesday, June 14, 2011 1:28 AM
To: Burnash, James; Jeff Darcy (jdarcy at redhat.com); gluster-users at gluster.org
Subject: RE: Files present on the backend but have become invisible from clients

hi James,
    bricks 3-10 dont have problems, I think brick 01, 02 went to split brain situation, could you confirm if you see the following logs in your mount's log file
[afr-self-heal-metadata.c:524:afr_sh_metadata_fix]0-stress-volume-replicate-0: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes.

Pranith.
________________________________________
From: Burnash, James [jburnash at knight.com]
Sent: Monday, June 13, 2011 11:56 PM
To: Pranith Kumar. Karampuri; Jeff Darcy        (jdarcy at redhat.com); gluster-users at gluster.org
Subject: RE: Files present on the backend but   have    become  invisible from clients

Hi Pranith.

Here is the revised listing - please notice that bricks g01 and g02 on the two servers (jc1letgfs14 and 15) have what appear to be "normal" trusted.afr attributes, but the balance of the bricks (3-10) all have =0x000000000000000000000000.

http://pastebin.com/j0hVFTzd

Is this right, or am I looking at this backwards / sideways?

James Burnash
Unix Engineer
Knight Capital Group

-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James
Sent: Monday, June 13, 2011 8:28 AM
To: 'Pranith Kumar. Karampuri'; Jeff Darcy (jdarcy at redhat.com); gluster-users at gluster.org
Subject: Re: Files present on the backend but have become invisible from clients

Hi Pranith.

Sorry - last week was a rough one. Disregard that pastebin - I will put up a new one that makes more sense and repost to the list.

James
-----Original Message-----
From: Pranith Kumar. Karampuri [mailto:pranithk at gluster.com]
Sent: Monday, June 13, 2011 1:12 AM
To: Burnash, James; Jeff Darcy (jdarcy at redhat.com); gluster-users at gluster.org
Subject: RE: Files present on the backend but have become invisible from clients

hi James,
     I looked at the pastebin sample, I see that all of the attrs are complete zeros, Could you let me know what is it that I am missing.

Pranith
________________________________________
From: gluster-users-bounces at gluster.org [gluster-users-bounces at gluster.org] on behalf of Burnash, James [jburnash at knight.com]
Sent: Saturday, June 11, 2011 12:57 AM
To: Jeff Darcy (jdarcy at redhat.com); gluster-users at gluster.org
Subject: Re: Files present on the backend but have      become  invisible from clients

Hi Jeff and Gluster users.

Question about inconsistent looking attributes on brick directories on my Gluster backend servers.

http://pastebin.com/b964zMu8

What stands out here is that the two original servers (jc1letgfs17 and 18) only show attributes of 0x000000000000000000000000 for every other pfs-ro1-client-X, while the two servers that were added some time ago show a different pattern (as can be seen in the pastebin sample).

Configuration at end of this message:

 Any hints or suggestions would be greatly appreciated. Thanks, James


Config:
All on Gluster 3.1.3
Servers:
4 CentOS 5.5 (ProLiant DL370 G6 servers, Intel Xeon 3200 MHz), Each with:
Single P812 Smart Array Controller,
Single MDS600 with 70 2TB SATA drives configured as RAID 50
48 MB RAM

Clients:
185 CentOS 5.2 (mostly DL360 G6).
/pfs2 is the mount point for a Duplicated-Replicate volume of 4 servers.

Volume Name: pfs-ro1
Type: Distributed-Replicate
Status: Started
Number of Bricks: 20 x 2 = 40
Transport-type: tcp
Bricks:
Brick1: jc1letgfs17-pfs1:/export/read-only/g01
Brick2: jc1letgfs18-pfs1:/export/read-only/g01
Brick3: jc1letgfs17-pfs1:/export/read-only/g02
Brick4: jc1letgfs18-pfs1:/export/read-only/g02
Brick5: jc1letgfs17-pfs1:/export/read-only/g03
Brick6: jc1letgfs18-pfs1:/export/read-only/g03
Brick7: jc1letgfs17-pfs1:/export/read-only/g04
Brick8: jc1letgfs18-pfs1:/export/read-only/g04
Brick9: jc1letgfs17-pfs1:/export/read-only/g05
Brick10: jc1letgfs18-pfs1:/export/read-only/g05
Brick11: jc1letgfs17-pfs1:/export/read-only/g06
Brick12: jc1letgfs18-pfs1:/export/read-only/g06
Brick13: jc1letgfs17-pfs1:/export/read-only/g07
Brick14: jc1letgfs18-pfs1:/export/read-only/g07
Brick15: jc1letgfs17-pfs1:/export/read-only/g08
Brick16: jc1letgfs18-pfs1:/export/read-only/g08
Brick17: jc1letgfs17-pfs1:/export/read-only/g09
Brick18: jc1letgfs18-pfs1:/export/read-only/g09
Brick19: jc1letgfs17-pfs1:/export/read-only/g10
Brick20: jc1letgfs18-pfs1:/export/read-only/g10
Brick21: jc1letgfs14-pfs1:/export/read-only/g01
Brick22: jc1letgfs15-pfs1:/export/read-only/g01
Brick23: jc1letgfs14-pfs1:/export/read-only/g02
Brick24: jc1letgfs15-pfs1:/export/read-only/g02
Brick25: jc1letgfs14-pfs1:/export/read-only/g03
Brick26: jc1letgfs15-pfs1:/export/read-only/g03
Brick27: jc1letgfs14-pfs1:/export/read-only/g04
Brick28: jc1letgfs15-pfs1:/export/read-only/g04
Brick29: jc1letgfs14-pfs1:/export/read-only/g05
Brick30: jc1letgfs15-pfs1:/export/read-only/g05
Brick11: jc1letgfs17-pfs1:/export/read-only/g06
Brick12: jc1letgfs18-pfs1:/export/read-only/g06
Brick13: jc1letgfs17-pfs1:/export/read-only/g07
Brick14: jc1letgfs18-pfs1:/export/read-only/g07
Brick15: jc1letgfs17-pfs1:/export/read-only/g08
Brick16: jc1letgfs18-pfs1:/export/read-only/g08
Brick17: jc1letgfs17-pfs1:/export/read-only/g09
Brick18: jc1letgfs18-pfs1:/export/read-only/g09
Brick19: jc1letgfs17-pfs1:/export/read-only/g10
Brick20: jc1letgfs18-pfs1:/export/read-only/g10
Brick21: jc1letgfs14-pfs1:/export/read-only/g01
Brick22: jc1letgfs15-pfs1:/export/read-only/g01
Brick23: jc1letgfs14-pfs1:/export/read-only/g02
Brick24: jc1letgfs15-pfs1:/export/read-only/g02
Brick25: jc1letgfs14-pfs1:/export/read-only/g03
Brick26: jc1letgfs15-pfs1:/export/read-only/g03
Brick27: jc1letgfs14-pfs1:/export/read-only/g04
Brick28: jc1letgfs15-pfs1:/export/read-only/g04
Brick29: jc1letgfs14-pfs1:/export/read-only/g05
Brick30: jc1letgfs15-pfs1:/export/read-only/g05
Brick31: jc1letgfs14-pfs1:/export/read-only/g06
Brick32: jc1letgfs15-pfs1:/export/read-only/g06
Brick33: jc1letgfs14-pfs1:/export/read-only/g07
Brick34: jc1letgfs15-pfs1:/export/read-only/g07
Brick35: jc1letgfs14-pfs1:/export/read-only/g08
Brick36: jc1letgfs15-pfs1:/export/read-only/g08
Brick37: jc1letgfs14-pfs1:/export/read-only/g09
Brick38: jc1letgfs15-pfs1:/export/read-only/g09
Brick39: jc1letgfs14-pfs1:/export/read-only/g10
Brick40: jc1letgfs15-pfs1:/export/read-only/g10
Options Reconfigured:
diagnostics.brick-log-level: ERROR
cluster.metadata-change-log: on
diagnostics.client-log-level: ERROR
performance.stat-prefetch: on
performance.cache-size: 2GB
network.ping-timeout: 10


DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com _______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux