You have 14,734 GFIDs in .glusterfs which are only on the brick which was up during the failure and it does not have any referenced file in that brick.
I have 14,734 GFIDS that are different. All the different ones are only on the brick that was live during the outage and concurrent file copy-in. The brick that was down at that time has no GFIDs that are not also on the up brick.As the bricks are 10TB, the find is going to be a long running process. I'm running several finds at once with gnu parallel but it will still take some time. Can't bring the up machine offline as it's in use. At least I have 24 cores to work with.I've only tested with one GFID but the file it referenced _IS_ on the down machine even though it has no GFID in the .glusterfs structure.On Tue, 2017-10-24 at 12:35 +0530, Karthik Subrahmanya wrote:KarthikRegards,should give you the file path.Hi Jim,If the link count is 2 then "find <brickpath> -samefile <brickpath/.glusterfs/<first two bits of gfid>/<next 2 bits of gfid>/<full gfid>"
Can you check whether the same hardlinks are present on both the bricks & both of them have the link count 2?On Tue, Oct 24, 2017 at 3:28 AM, Jim Kinney <jim.kinney@xxxxxxxxx> wrote:I'm not so lucky. ALL of mine show 2 links and none have the attr data that supplies the path to the original.I have the inode from stat. Looking now to dig out the path/filename from xfs_db on the specific inodes individually.Is the hash of the filename or <path>/filename and if so relative to where? /, <path from top of brick>, ?On Mon, 2017-10-23 at 18:54 +0000, Matt Waymack wrote:In my case I was able to delete the hard links in the .glusterfs folders of the bricks and it seems to have done the trick, thanks!
From: Karthik Subrahmanya [mailto:ksubrahm@xxxxxxxxxx]
Sent: Monday, October 23, 2017 1:52 AM
To: Jim Kinney <jim.kinney@xxxxxxxxx>; Matt Waymack <mwaymack@xxxxxxxxx>
Cc: gluster-users <Gluster-users@xxxxxxxxxxx>
Subject: Re: gfid entries in volume heal info that do not heal
Hi Jim & Matt,
Can you also check for the link count in the stat output of those hardlink entries in the .glusterfs folder on the bricks.
If the link count is 1 on all the bricks for those entries, then they are orphaned entries and you can delete those hardlinks.To be on the safer side have a backup before deleting any of the entries.
Regards,
Karthik
On Fri, Oct 20, 2017 at 3:18 AM, Jim Kinney <jim.kinney@xxxxxxxxx> wrote:
I've been following this particular thread as I have a similar issue (RAID6 array failed out with 3 dead drives at once while a 12 TB load was being copied into one mounted space - what a mess)
I have >700K GFID entries that have no path data:
Example:
getfattr -d -e hex -m . .glusterfs/00/00/0000a5ef-5af7
-401b-84b5-ff2a51c10421 # file: .glusterfs/00/00/0000a5ef-5af7
-401b-84b5-ff2a51c10421 security.selinux=0x73797374656
d5f753a6f626a6563745f723a756e6 c6162656c65645f743a733000 trusted.bit-rot.version=0x0200
00000000000059b1b316000270e7 trusted.gfid=0x0000a5ef5af7401
b84b5ff2a51c10421
[root@bmidata1 brick]# getfattr -d -n trusted.glusterfs.pathinfo -e hex -m . .glusterfs/00/00/0000a5ef-5af7
-401b-84b5-ff2a51c10421 .glusterfs/00/00/0000a5ef-5af7
-401b-84b5-ff2a51c10421: trusted.glusterfs.pathinfo: No such attribute
I had to totally rebuild the dead RAID array and did a copy from the live one before activating gluster on the rebuilt system. I accidentally copied over the .glusterfs folder from the working side
(replica 2 only for now - adding arbiter node as soon as I can get this one cleaned up).
I've run the methods from "http://docs.gluster.org/en/la
test/Troubleshooting/gfid-to- " with no results using random GFIDs. A full systemic run using the script from method 3 crashes with "too many nested links" error (or something similar).path/
When I run gluster volume heal volname info, I get 700K+ GFIDs. Oh. gluster 3.8.4 on Centos 7.3
Should I just remove the contents of the .glusterfs folder on both and restart gluster and run a ls/stat on every file?
When I run a heal, it no longer has a decreasing number of files to heal so that's an improvement over the last 2-3 weeks :-)
On Tue, 2017-10-17 at 14:34 +0000, Matt Waymack wrote:
Attached is the heal log for the volume as well as the shd log.Run these commands on all the bricks of the replica pair to get the attrs set on the backend.[root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d 811a2 getfattr: Removing leading '/' from absolute path names# file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d8 11a2 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6 c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-2=0x000000000000000100000000 trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2 trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d32346 6332d346463622d393630322d38393 56136396461363131662f435f564f4 c2d623030312d693637342d63642d6 3772e6d6435 [root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d 811a2 getfattr: Removing leading '/' from absolute path names# file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d8 11a2 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6 c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-2=0x000000000000000100000000 trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2 trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d32346 6332d346463622d393630322d38393 56136396461363131662f435f564f4 c2d623030312d693637342d63642d6 3772e6d6435 [root@tpc-arbiter1-100617 ~]# getfattr -d -e hex -m . /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d 811a2 getfattr: /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d 811a2: No such file or directory [root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d 33df3 getfattr: Removing leading '/' from absolute path names# file: exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d3 3df3 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6 c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-11=0x000000000000000100000000 trusted.gfid=0xe0c56bf78bfe46cabde1e46b92d33df3 trusted.gfid2path.be3ba24c3ef95ff2=0x63323366353834652d35356 6652d343033382d393131622d38663 73063656334616136662f435f564f4 c2d623030332d69313331342d63642 d636d2d63722e6d6435 [root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d 33df3 getfattr: Removing leading '/' from absolute path names# file: exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d3 3df3 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6 c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-11=0x000000000000000100000000 trusted.gfid=0xe0c56bf78bfe46cabde1e46b92d33df3 trusted.gfid2path.be3ba24c3ef95ff2=0x63323366353834652d35356 6652d343033382d393131622d38663 73063656334616136662f435f564f4 c2d623030332d69313331342d63642 d636d2d63722e6d6435 [root@tpc-arbiter1-100617 ~]# getfattr -d -e hex -m . /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d 33df3 getfattr: /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d 33df3: No such file or directory And the output of "gluster volume heal <volname> info split-brain"[root@tpc-cent-glus1-081017 ~]# gluster volume heal gv0 info split-brainBrick tpc-cent-glus1-081017:/exp/b1/gv0 Status: ConnectedNumber of entries in split-brain: 0Brick tpc-cent-glus2-081017:/exp/b1/gv0 Status: ConnectedNumber of entries in split-brain: 0Brick tpc-arbiter1-100617:/exp/b1/gv0 Status: ConnectedNumber of entries in split-brain: 0Brick tpc-cent-glus1-081017:/exp/b2/gv0 Status: ConnectedNumber of entries in split-brain: 0Brick tpc-cent-glus2-081017:/exp/b2/gv0 Status: ConnectedNumber of entries in split-brain: 0Brick tpc-arbiter1-100617:/exp/b2/gv0 Status: ConnectedNumber of entries in split-brain: 0Brick tpc-cent-glus1-081017:/exp/b3/gv0 Status: ConnectedNumber of entries in split-brain: 0Brick tpc-cent-glus2-081017:/exp/b3/gv0 Status: ConnectedNumber of entries in split-brain: 0Brick tpc-arbiter1-100617:/exp/b3/gv0 Status: ConnectedNumber of entries in split-brain: 0Brick tpc-cent-glus1-081017:/exp/b4/gv0 Status: ConnectedNumber of entries in split-brain: 0Brick tpc-cent-glus2-081017:/exp/b4/gv0 Status: ConnectedNumber of entries in split-brain: 0Brick tpc-arbiter1-100617:/exp/b4/gv0 Status: ConnectedNumber of entries in split-brain: 0-MattFrom: Karthik Subrahmanya [mailto:ksubrahm@xxxxxxxxxx]Sent: Tuesday, October 17, 2017 1:26 AMTo: Matt Waymack <mwaymack@xxxxxxxxx>Cc: gluster-users <Gluster-users@xxxxxxxxxxx>Subject: Re: gfid entries in volume heal info that do not healHi Matt,Run these commands on all the bricks of the replica pair to get the attrs set on the backend.On the bricks of first replica set:getfattr -d -e hex -m . <brick path>/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 On the fourth replica set:getfattr -d -e hex -m . <brick path>/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 Also run the "gluster volume heal <volname>" once and send the shd log.And the output of "gluster volume heal <volname> info split-brain"Regards,KarthikOn Mon, Oct 16, 2017 at 9:51 PM, Matt Waymack <mailto:mwaymack@xxxxxxxxx> wrote:OK, so here’s my output of the volume info and the heal info. I have not yet tracked down physical location of these files, any tips to finding them would be appreciated, but I’m definitely just wanting them gone. I forgot to mention earlier that the cluster is running 3.12 and was upgraded from 3.10; these files were likely stuck like this when it was on 3.10.[root@tpc-cent-glus1-081017 ~]# gluster volume info gv0Volume Name: gv0Type: Distributed-ReplicateVolume ID: 8f07894d-e3ab-4a65-bda1-9d9dd46db007 Status: StartedSnapshot Count: 0Number of Bricks: 4 x (2 + 1) = 12Transport-type: tcpBricks:Brick1: tpc-cent-glus1-081017:/exp/b1/gv0 Brick2: tpc-cent-glus2-081017:/exp/b1/gv0 Brick3: tpc-arbiter1-100617:/exp/b1/gv0 (arbiter) Brick4: tpc-cent-glus1-081017:/exp/b2/gv0 Brick5: tpc-cent-glus2-081017:/exp/b2/gv0 Brick6: tpc-arbiter1-100617:/exp/b2/gv0 (arbiter) Brick7: tpc-cent-glus1-081017:/exp/b3/gv0 Brick8: tpc-cent-glus2-081017:/exp/b3/gv0 Brick9: tpc-arbiter1-100617:/exp/b3/gv0 (arbiter) Brick10: tpc-cent-glus1-081017:/exp/b4/gv0 Brick11: tpc-cent-glus2-081017:/exp/b4/gv0 Brick12: tpc-arbiter1-100617:/exp/b4/gv0 (arbiter) Options Reconfigured:nfs.disable: ontransport.address-family: inet[root@tpc-cent-glus1-081017 ~]# gluster volume heal gv0 infoBrick tpc-cent-glus1-081017:/exp/b1/gv0 <gfid:108694db-c039-4b7c-bd3d-ad6a15d811a2> <gfid:6d5ade20-8996-4de2-95d5-20ef98004742> <gfid:bc6cdc3d-5c46-4597-a7eb-282b21e9bdd5> <gfid:3c2ff4d1-3662-4214-8f21-f8f47dbdbf06> <gfid:053e2fb1-bc89-476e-a529-90dffa39963c> <removed to save scrolling>Status: ConnectedNumber of entries: 118Brick tpc-cent-glus2-081017:/exp/b1/gv0 <gfid:108694db-c039-4b7c-bd3d-ad6a15d811a2> <gfid:6d5ade20-8996-4de2-95d5-20ef98004742> <gfid:bc6cdc3d-5c46-4597-a7eb-282b21e9bdd5> <gfid:3c2ff4d1-3662-4214-8f21-f8f47dbdbf06> <gfid:053e2fb1-bc89-476e-a529-90dffa39963c> <removed to save scrolling>Status: ConnectedNumber of entries: 118Brick tpc-arbiter1-100617:/exp/b1/gv0 Status: ConnectedNumber of entries: 0Brick tpc-cent-glus1-081017:/exp/b2/gv0 Status: ConnectedNumber of entries: 0Brick tpc-cent-glus2-081017:/exp/b2/gv0 Status: ConnectedNumber of entries: 0Brick tpc-arbiter1-100617:/exp/b2/gv0 Status: ConnectedNumber of entries: 0Brick tpc-cent-glus1-081017:/exp/b3/gv0 Status: ConnectedNumber of entries: 0Brick tpc-cent-glus2-081017:/exp/b3/gv0 Status: ConnectedNumber of entries: 0Brick tpc-arbiter1-100617:/exp/b3/gv0 Status: ConnectedNumber of entries: 0Brick tpc-cent-glus1-081017:/exp/b4/gv0 <gfid:e0c56bf7-8bfe-46ca-bde1-e46b92d33df3> <gfid:6f0a0549-8669-46de-8823-d6677fdca8e3> <gfid:d0e2fb2a-21b5-4ea8-a578-0801280b2530> <gfid:48bff79c-7bc2-4dc5-8b7f-4401b27fdf5a> <gfid:5902593d-a059-4ec7-b18b-7a2ab5c49a50> <gfid:cb821178-4621-4fcf-90f3-5b5c2ad7f756> <gfid:6aea0805-8dd1-437c-b922-52c9d11e488a> <gfid:f4076a37-2e2f-4d7a-90dd-0a3560a4bdff> <gfid:51ff7386-a550-4971-957c-b42c4d915e9f> <gfid:4309f7b8-3a9d-4bc8-ba2b-799f8a02611b> <gfid:b76746ec-6d7d-4ea3-a001-c96672a4d47e> <gfid:f8de26e7-d17d-41e0-adcd-e7d24ed74ac8> <gfid:8e2c4540-e0b4-4006-bb5d-aacd57f8f21b> <gfid:183ebefb-b827-4cbc-b42b-bfd136d5cabb> <gfid:88d492fe-bfbd-4463-ba55-0582d0ad671b> <gfid:e3a6c068-d48b-44b5-9480-245a69648a9b> <gfid:4aab9c6a-22d2-469a-a688-7b0a8784f4b1> <gfid:c6d182f2-7e46-4502-a0d2-b92824caa4de> <gfid:eb546f93-e9d6-4a59-ac35-6139b5c40919> <gfid:6043e381-7edf-4569-bc37-e27dd13549d2> <gfid:52090dc7-7a3c-40f9-9c54-3395f5158eab> <gfid:ecceee46-4310-421e-b56e-5fe46bd5263c> <gfid:354aea57-4b40-47fc-8ede-1d7e3b7501b4> <gfid:d43284d4-86aa-42ff-98b8-f6340b407d9d> Status: ConnectedNumber of entries: 24Brick tpc-cent-glus2-081017:/exp/b4/gv0 <gfid:e0c56bf7-8bfe-46ca-bde1-e46b92d33df3> <gfid:6f0a0549-8669-46de-8823-d6677fdca8e3> <gfid:d0e2fb2a-21b5-4ea8-a578-0801280b2530> <gfid:48bff79c-7bc2-4dc5-8b7f-4401b27fdf5a> <gfid:5902593d-a059-4ec7-b18b-7a2ab5c49a50> <gfid:cb821178-4621-4fcf-90f3-5b5c2ad7f756> <gfid:6aea0805-8dd1-437c-b922-52c9d11e488a> <gfid:f4076a37-2e2f-4d7a-90dd-0a3560a4bdff> <gfid:51ff7386-a550-4971-957c-b42c4d915e9f> <gfid:4309f7b8-3a9d-4bc8-ba2b-799f8a02611b> <gfid:b76746ec-6d7d-4ea3-a001-c96672a4d47e> <gfid:f8de26e7-d17d-41e0-adcd-e7d24ed74ac8> <gfid:8e2c4540-e0b4-4006-bb5d-aacd57f8f21b> <gfid:183ebefb-b827-4cbc-b42b-bfd136d5cabb> <gfid:88d492fe-bfbd-4463-ba55-0582d0ad671b> <gfid:e3a6c068-d48b-44b5-9480-245a69648a9b> <gfid:4aab9c6a-22d2-469a-a688-7b0a8784f4b1> <gfid:c6d182f2-7e46-4502-a0d2-b92824caa4de> <gfid:eb546f93-e9d6-4a59-ac35-6139b5c40919> <gfid:6043e381-7edf-4569-bc37-e27dd13549d2> <gfid:52090dc7-7a3c-40f9-9c54-3395f5158eab> <gfid:ecceee46-4310-421e-b56e-5fe46bd5263c> <gfid:354aea57-4b40-47fc-8ede-1d7e3b7501b4> <gfid:d43284d4-86aa-42ff-98b8-f6340b407d9d> Status: ConnectedNumber of entries: 24Brick tpc-arbiter1-100617:/exp/b4/gv0 Status: ConnectedNumber of entries: 0Thank you for your help!From: Karthik Subrahmanya [mailto:mailto:ksubrahm@redhat.com ]Sent: Monday, October 16, 2017 10:27 AMTo: Matt Waymack <mailto:mwaymack@xxxxxxxxx>Cc: gluster-users <mailto:Gluster-users@gluster.org >Subject: Re: gfid entries in volume heal info that do not healHi Matt,The files might be in split brain. Could you please send the outputs of these?gluster volume info <volname>gluster volume heal <volname> infoAnd also the getfattr output of the files which are in the heal info output from all the bricks of that replica pair.getfattr -d -e hex -m . <file path on brick>Thanks & RegardsKarthikOn 16-Oct-2017 8:16 PM, "Matt Waymack" <mailto:mwaymack@xxxxxxxxx> wrote:Hi all,I have a volume where the output of volume heal info shows several gfid entries to be healed, but they’ve been there for weeks and have not healed. Any normal file that shows up on the heal info does get healed as expected, but these gfid entries do not. Is there any way to remove these orphaned entries from the volume so they are no longer stuck in the heal process?Thank you!_______________________________________________ Gluster-users mailing listmailto:Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing listGluster-users@xxxxxxxxxxxhttp://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users