Hi,
In fact, going up one directory level (root of the gluster volume), I get similarly 1. # file: data/glusterfs/md1/brick1/ trusted.afr.md1-client-0=0x000000000000000000000000 trusted.afr.md1-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b 2. # file: data/glusterfs/md1/brick1/ trusted.afr.md1-client-0=0x000000000000000000000000 trusted.afr.md1-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b 3. # file: data/glusterfs/md1/brick1/ trusted.afr.md1-client-2=0x000000000000000000000000 trusted.afr.md1-client-3=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000000000000055555554 trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b 4. # file: data/glusterfs/md1/brick1/ trusted.afr.md1-client-2=0x000000000000000000000000 trusted.afr.md1-client-3=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000000000000055555554 trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b
These four bricks seem consistent, while the remaining two
5. # file: data/glusterfs/md1/brick1/ trusted.afr.md1-client-0=0x000000000000000000000000 trusted.afr.md1-client-1=0x000000000000000000000000 trusted.afr.md1-client-4=0x000000000000000000000000 trusted.afr.md1-client-5=0x000000000000000200000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9 trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b 6. # file: data/glusterfs/md1/brick1/ trusted.afr.md1-client-0=0x000000000000000000000000 trusted.afr.md1-client-1=0x000000000000000000000000 trusted.afr.md1-client-4=0x000000000000000100000000 trusted.afr.md1-client-5=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9 trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b
show two extra entries trusted.afr.md1-client-0 & trusted.afr.md1-client-1 and inconsistency between trusted.afr.md1-client-4 & trusted.afr.md1-client-5.
Could it be this issue which propagates to all subdirectories in the volume and thus results in the error message in the client log file ?
Should I remove trusted.afr.md1-client-0 & trusted.afr.md1-client-1 from brick5 & brick 6 ?
Meanwhile, I am performing on the client find /home/.md1 -type f -exec cat {} > /dev/null \; to check if I can access the content of all files on the volume. For the moment, only 4 files gave errors.
It is quite frustrating, because I believe that all my data is still intact on the bricks and it seems that it is only the metadata which got screwed... I am reluctant to perform something to heal by myself, because I have the feeling that it could do more harm than good.
It's been more than 2 days now that my colleagues cannot access the data and I cannot make them wait much longer...
A.
On Thursday 12 March 2015 12:59:00 Alessandro Ipe wrote: Hi,
Sorry about that, I thought I was using the -e hex... I must have removed it at some point accidentally.
Here they are 1. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0x000000000000000000000000 trusted.afr.md1-client-1=0x000000000000000000000000 trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff
2. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0x000000000000000000000000 trusted.afr.md1-client-1=0x000000000000000000000000 trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff
3. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-2=0x000000000000000000000000 trusted.afr.md1-client-3=0x000000000000000100000000 trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x00000001000000000000000055555554
4. getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-2=0x000000000000000100000000 trusted.afr.md1-client-3=0x000000000000000000000000 trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x00000001000000000000000055555554
5. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-4=0x000000000000000000000000 trusted.afr.md1-client-5=0x000000000000000100000000 trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9
6. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-4=0x000000000000000100000000 trusted.afr.md1-client-5=0x000000000000000000000000 trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9
Thanks for your help,
A.
On Thursday 12 March 2015 07:51:40 Krutika Dhananjay wrote: Hi, Could you provide the xattrs in hex format? You can execute `getfattr -d -m . -e hex <path-to-the-directory/file-on-the-brick(s)>` -Krutika From: "Alessandro Ipe" <Alessandro.Ipe@xxxxxxxx> Hi,
Actually, my gluster volume is distribute-replicate so I should provide the attributes on all the bricks. Here they are: 1. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0sAAAAAAAAAAAAAAAA trusted.afr.md1-client-1=0sAAAAAAAAAAAAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAAAAAQAAAACqqqqq/////w==
2. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0sAAAAAAAAAAAAAAAA trusted.afr.md1-client-1=0sAAAAAAAAAAAAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAAAAAQAAAACqqqqq/////w==
3. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-2=0sAAAAAAAAAAAAAAAA trusted.afr.md1-client-3=0sAAAAAAAAAAEAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAAAAAQAAAAAAAAAAVVVVVA==
4. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-2=0sAAAAAAAAAAEAAAAA trusted.afr.md1-client-3=0sAAAAAAAAAAAAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAAAAAQAAAAAAAAAAVVVVVA==
5. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-4=0sAAAAAAAAAAAAAAAA trusted.afr.md1-client-5=0sAAAAAAAAAAEAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAAAAAQAAAABVVVVVqqqqqQ==
6. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-4=0sAAAAAAAAAAEAAAAA trusted.afr.md1-client-5=0sAAAAAAAAAAAAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAAAAAQAAAABVVVVVqqqqqQ==
so it seems in fact that there are discrepancies between 3-4 and 5-6 (replicate pairs).
A.
On Thursday 12 March 2015 11:33:00 Alessandro Ipe wrote: Hi,
"gluster volume heal md1 info split-brain" returns approximatively 2000 files (already divided by 2 due to replicate volume). So manually repairing each split-brain is unfeasable. Before scripting some procedure, I need to be sure that I will not harm further the gluster system.
Moreover, I noticed that the messages printed in the logs are all about directories, e.g. [2015-03-12 10:06:53.423856] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-1: Unable to self-heal contents of '/root' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 1 ] [ 1 0 ] ] [2015-03-12 10:06:53.424005] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: Unable to self-heal contents of '/root' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 1 ] [ 1 0 ] ] [2015-03-12 10:06:53.424110] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-1: metadata self heal failed, on /root [2015-03-12 10:06:53.424290] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-2: metadata self heal failed, on /root
Getting the attributes of that directory on each brick gives me for the first # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0sAAAAAAAAAAAAAAAA trusted.afr.md1-client-1=0sAAAAAAAAAAAAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAAAAAQAAAACqqqqq/////w==
and for the second # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0sAAAAAAAAAAAAAAAA trusted.afr.md1-client-1=0sAAAAAAAAAAAAAAAA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAAAAAQAAAACqqqqq/////w==
so it seems that there are both rigorously identical. However, according to your split -brain tutorial, none of them has 0x000000000000000000000000. What 0sAAAAAAAAAAAAAAAA means in fact ?
Should I change both attributes on each directory to 0x000000000000000000000000 ?
Many thanks,
A.
On Wednesday 11 March 2015 08:02:56 Krutika Dhananjay wrote: Hi, Have you gone through https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md ? If not, could you go through that once and try the steps given there? Do let us know if something is not clear in the doc. -Krutika From: "Alessandro Ipe" <Alessandro.Ipe@xxxxxxxx> Well, it is even worse. Now when doing a "ls -R" on the volume results in a lot of [2015-03-11 11:18:31.957505] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: Unable to self-heal contents of '/library' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 1 0 ] ] I am desperate...
_______________________________________________ --
Dr. Ir. Alessandro Ipe Department of Observations Tel. +32 2 373 06 31 Remote Sensing from Space Fax. +32 2 374 67 88 Royal Meteorological Institute Avenue Circulaire 3 Email: B-1180 Brussels Belgium Alessandro.Ipe@xxxxxxxx Web: http://gerb.oma.be
--
Dr. Ir. Alessandro Ipe Department of Observations Tel. +32 2 373 06 31 Remote Sensing from Space Fax. +32 2 374 67 88 Royal Meteorological Institute Avenue Circulaire 3 Email: B-1180 Brussels Belgium Alessandro.Ipe@xxxxxxxx Web: http://gerb.oma.be
--
Dr. Ir. Alessandro Ipe Department of Observations Tel. +32 2 373 06 31 Remote Sensing from Space Fax. +32 2 374 67 88 Royal Meteorological Institute Avenue Circulaire 3 Email: B-1180 Brussels Belgium Alessandro.Ipe@xxxxxxxx Web: http://gerb.oma.be
|
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users