Thanks Ravi, that seem to have done the trick and now there are no more files to be healed.
Just for your and other user's information: the OC_DEFAULT_MODULE was in fact a directory which contains 2 files. So I did a stat on that directory and then the "heal info" would show the 2 following files (and not the GFID anymore):
Brick node2:/data/myvolume/brick
/user/files_encryption/keys/files_trashbin/files/Library.db-journal.bc.d1501276401/OC_DEFAULT_MODULE/user.shareKey
/user/files_encryption/keys/files_trashbin/files/Library.db-journal.bc.d1501276401/OC_DEFAULT_MODULE/fileKey
Status: Connected
Number of entries: 2
After that I just waited for the self-heal to do it's job on node2 just and it did as you can see below from the output of the glustershd.log file:
[2017-07-31 09:40:05.045437] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-myvolume-replicate-0: Completed data selfheal on f1f0e091-2c4c-4a31-bc40-97949462dc4a. sources=0 [1] sinks=2
[2017-07-31 09:40:05.047194] I [MSGID: 108026] [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] 0-myvolume-replicate-0: performing metadata selfheal on f1f0e091-2c4c-4a31-bc40-97949462dc4a
[2017-07-31 09:40:05.050996] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-myvolume-replicate-0: Completed metadata selfheal on f1f0e091-2c4c-4a31-bc40-97949462dc4a. sources=0 [1] sinks=2
[2017-07-31 09:40:05.055781] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-myvolume-replicate-0: Completed data selfheal on be2b2097-2b1a-45e1-ad9e-3cf6bf5b4caa. sources=0 [1] sinks=2
[2017-07-31 09:40:05.057026] I [MSGID: 108026] [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] 0-myvolume-replicate-0: performing metadata selfheal on be2b2097-2b1a-45e1-ad9e-3cf6bf5b4caa
[2017-07-31 09:40:05.060716] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-myvolume-replicate-0: Completed metadata selfheal on be2b2097-2b1a-45e1-ad9e-3cf6bf5b4caa. sources=0 [1] sinks=2
Thanks again Ravi for your help in this procedure. Now I continue and try to fix my geo-replication issue (which I have documented on the mailing list a few days ago).
Best,
M.
-------- Original Message --------Subject: Re: [Gluster-users] Possible stale .glusterfs/indices/xattrop file?Local Time: July 31, 2017 11:24 AMUTC Time: July 31, 2017 9:24 AMFrom: ravishankar@xxxxxxxxxxTo: mabi <mabi@xxxxxxxxxxxxx>Gluster Users <gluster-users@xxxxxxxxxxx>
On 07/31/2017 02:33 PM, mabi wrote:Now I understand what you mean the the "-samefile" parameter of "find". As requested I have now run the following command on all 3 nodes with the ouput of all 3 nodes below:sudo find /data/myvolume/brick -samefile /data/myvolume/brick/.glusterfs/29/e0/29e0d13e-1217-41cc-9bda-1fbbf781c397 -lsnode1:8404683 0 lrwxrwxrwx 1 root root 66 Jul 27 15:43 /data/myvolume/brick/.glusterfs/29/e0/29e0d13e-1217-41cc-9bda-1fbbf781c397 -> ../../fe/c0/fec0e4f4-38d2-4e2e-b5db-fdc0b9b54810/OC_DEFAULT_MODULEnode2:8394638 0 lrwxrwxrwx 1 root root 66 Jul 27 15:43 /data/myvolume/brick/.glusterfs/29/e0/29e0d13e-1217-41cc-9bda-1fbbf781c397 -> ../../fe/c0/fec0e4f4-38d2-4e2e-b5db-fdc0b9b54810/OC_DEFAULT_MODULEarbiternode:find: '/data/myvolume/brick/.glusterfs/29/e0/29e0d13e-1217-41cc-9bda-1fbbf781c397': No such file or directoryRight, so the file OC_DEFAULT_MODULE is missing in this brick It's parent directory has gfid fec0e4f4-38d2-4e2e-b5db-fdc0b9b54810.Goal is to do a stat of this file from the fuse mount. If you know the complete path to this file, good. Otherwise you can use this script [1] to find the path to the parent dir corresponding to the gfid fec0e4f4-38d2-4e2e-b5db-fdc0b9b54810 like so:`./gfid-to-dirname.sh /data/myvolume/brick fec0e4f4-38d2-4e2e-b5db-fdc0b9b54810`Try to stat the file from a new (temporary) fuse mount to avoid any caching effects.-RaviHope that helps.-------- Original Message --------Subject: Re: [Gluster-users] Possible stale .glusterfs/indices/xattrop file?Local Time: July 31, 2017 10:55 AMUTC Time: July 31, 2017 8:55 AMFrom: ravishankar@xxxxxxxxxxTo: mabi <mabi@xxxxxxxxxxxxx>Gluster Users <gluster-users@xxxxxxxxxxx>
On 07/31/2017 02:00 PM, mabi wrote:To quickly resume my current situation:on node2 I have found the following file xattrop/indices file which matches the GFID of the "heal info" command (below is there output of "ls -lai":2798404 ---------- 2 root root 0 Apr 28 22:51 /data/myvolume/brick/.glusterfs/indices/xattrop/29e0d13e-1217-41cc-9bda-1fbbf781c397As you can see this file has inode number 2798404, so I ran the following command on all my nodes (node1, node2 and arbiternode):...which is what I was saying is incorrect. 2798404 is an XFS inode number and is not common to the same file across nodes. So you will get different results. Use the -samefile flag I shared earlier.-Ravisudo find /data/myvolume/brick -inum 2798404 -lsHere below are the results for all 3 nodes:node1:2798404 19 -rw-r--r-- 2 www-data www-data 32 Jun 19 17:42 /data/myvolume/brick/.glusterfs/e6/5b/e65b77e2-a4c4-4824-a7bb-58df969ce4b02798404 19 -rw-r--r-- 2 www-data www-data 32 Jun 19 17:42 /data/myvolume/brick/<REMOVED_DIRECTORIES_IN_BETWEEN>/fileKeynode2:2798404 1 ---------- 2 root root 0 Apr 28 22:51 /data/myvolume/brick/.glusterfs/indices/xattrop/29e0d13e-1217-41cc-9bda-1fbbf781c3972798404 1 ---------- 2 root root 0 Apr 28 22:51 /data/myvolume/brick/.glusterfs/indices/xattrop/xattrop-6fa49ad5-71dd-4ec2-9246-7b302ab92d38arbirternode:NOTHINGAs you requested I have tried to run on node1 a getfattr on the fileKey file by using the following command:getfattr -m . -d -e hex fileKeybut there is no output. I am not familiar with the getfattr command so maybe I am using the wrong parameters, could you help me with that?-------- Original Message --------Subject: Re: [Gluster-users] Possible stale .glusterfs/indices/xattrop file?Local Time: July 31, 2017 9:25 AMUTC Time: July 31, 2017 7:25 AMFrom: ravishankar@xxxxxxxxxxTo: mabi <mabi@xxxxxxxxxxxxx>Gluster Users <gluster-users@xxxxxxxxxxx>On 07/31/2017 12:20 PM, mabi wrote:I did a find on this inode number and I could find the file but only on node1 (nothing on node2 and the new arbiternode). Here is an ls -lai of the file itself on node1:Sorry I don't understand, isn't that (XFS) inode number specific to node2's brick? If you want to use the same command, maybe you should try `find /data/myvolume/brick -samefile /data/myvolume/brick/.glusterfs/29/e0/29e0d13e-1217-41cc-9bda-1fbbf781c397` on all 3 bricks.-rw-r--r-- 1 www-data www-data 32 Jun 19 17:42 fileKeyAs you can see it is a 32 bytes file and as you suggested I ran a "stat" on this very same file through a glusterfs mount (using fuse) but unfortunately nothing happened. The GFID is still being displayed to be healed. Just in case here is the output of the stat:File: ‘fileKey’Size: 32 Blocks: 1 IO Block: 131072 regular fileDevice: 1eh/30d Inode: 12086351742306673840 Links: 1Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)Access: 2017-06-19 17:42:35.339773495 +0200Modify: 2017-06-19 17:42:35.343773437 +0200Change: 2017-06-19 17:42:35.343773437 +0200Birth: -Is this 'fileKey' on node1 having the same gfid (see getfattr output)? Looks like it is missing the hardlink inside .glusterfs folder since the link count is only 1.Thanks,RaviWhat else can I do or try in order to fix this situation?-------- Original Message --------Subject: Re: [Gluster-users] Possible stale .glusterfs/indices/xattrop file?Local Time: July 31, 2017 3:27 AMUTC Time: July 31, 2017 1:27 AMFrom: ravishankar@xxxxxxxxxxTo: mabi <mabi@xxxxxxxxxxxxx>Gluster Users <gluster-users@xxxxxxxxxxx>
On 07/30/2017 02:24 PM, mabi wrote:Hi Ravi,Thanks for your hints. Below you will find the answer to your questions.First I tried to start the healing process by running:gluster volume heal myvolumeand then as you suggested watch the output of the glustershd.log file but nothing appeared in that log file after running the above command. I checked the files which need to be healing using the "heal <volume> info" command and it still shows that very same GFID on node2 to be healed. So nothing changed here.The file /data/myvolume/brick/.glusterfs/indices/xattrop/29e0d13e-1217-41cc-9bda-1fbbf781c397 is only on node2 and not on my nod1 nor on my arbiternode. This file seems to be a regular file and not a symlink. Here is the output of the stat command on it from my node2:File: ‘/data/myvolume/brick/.glusterfs/indices/xattrop/29e0d13e-1217-41cc-9bda-1fbbf781c397’Size: 0 Blocks: 1 IO Block: 512 regular empty fileDevice: 25h/37d Inode: 2798404 Links: 2Okay, link count of 2 means there is a hardlink somewhere on the brick. Try the find command again. I see that the inode number is 2798404, not the one you shared in your first mail. Once you find the path to the file, do a stat of the file from mount. This should create the entry in the other 2 bricks and do the heal. But FWIW, this seems to be a zero byte file.Regards,RaviAccess: (0000/----------) Uid: ( 0/ root) Gid: ( 0/ root)Access: 2017-04-28 22:51:15.215775269 +0200Modify: 2017-04-28 22:51:15.215775269 +0200Change: 2017-07-30 08:39:03.700872312 +0200Birth: -I hope this is enough info for a starter, else let me know if you need any more info. I would be glad to resolve this weird file which needs to be healed but can not.Best regards,Mabi-------- Original Message --------Subject: Re: [Gluster-users] Possible stale .glusterfs/indices/xattrop file?Local Time: July 30, 2017 3:31 AMUTC Time: July 30, 2017 1:31 AMFrom: ravishankar@xxxxxxxxxx
On 07/29/2017 04:36 PM, mabi wrote:Hi,Sorry for mailing again but as mentioned in my previous mail, I have added an arbiter node to my replica 2 volume and it seem to have gone fine except for the fact that there is one single file which needs healing and does not get healed as you can see here from the output of a "heal info":Brick node1.domain.tld:/data/myvolume/brickStatus: ConnectedNumber of entries: 0Brick node2.domain.tld:/data/myvolume/brick<gfid:29e0d13e-1217-41cc-9bda-1fbbf781c397>Status: ConnectedNumber of entries: 1Brick arbiternode.domain.tld:/srv/glusterfs/myvolume/brickStatus: ConnectedNumber of entries: 0On my node2 the respective .glusterfs/indices/xattrop directory contains two files as you can see below:ls -lai /data/myvolume/brick/.glusterfs/indices/xattroptotal 7618010 drw------- 2 root root 4 Jul 29 12:15 .9 drw------- 5 root root 5 Apr 28 22:15 ..2798404 ---------- 2 root root 0 Apr 28 22:51 29e0d13e-1217-41cc-9bda-1fbbf781c3972798404 ---------- 2 root root 0 Apr 28 22:51 xattrop-6fa49ad5-71dd-4ec2-9246-7b302ab92d38I tried to find the real file on my brick where this xattrop file points to using its inode number (command: find /data/myvolume/brick/data -inum 8394642) but it does not find any associated file.So my question here is, is it possible that this is a stale file which just forgot to get deleted from the indices/xattrop file by gluster for some unknown reason? If yes is it safe for me to delete these two files? or what would be the correct process in that case?The 'xattrop-6fa...' is the base entry. gfids of files that need heal are hard linked to this entry, so nothing needs to be done for it. But you need to find out why '29e0d13...' is not healing. Launch the heal and observe the glustershd logs for errors. I suppose the inode number for .glusterfs/29/e0/29e0d13e-1217-41cc-9bda-1fbbf781c397 is what is 8394642. Is .glusterfs/29/e0/29e0d13e-1217-41cc-9bda-1fbbf781c397 a regular file or symlink? Does it exist in the other 2 bricks? What is the link count (as seen from stat <file>)?-RaviThank you for your input.Mabi_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users