Xavi, Thanks for checking this. We have an external metadata server which keeps track of every file that gets written to the volume and has the capability to validate the file contents. Will use this capability to validate the file contents. Once the data is verified will the following sequence of steps be sufficient to restore the volume. 1) Rebalance the volume. 2) After rebalance is complete, stop ingesting more data to the volume. 3) Let the pending heals complete. 4) Stop the volume 5) For any heals that fail because of mismatching version/dirty extended attributes on the directories, set this to a matching value on all the nodes. Thanks and Regards, Ram -----Original Message----- From: Xavier Hernandez [mailto:xhernandez@xxxxxxxxxx] Sent: Tuesday, March 14, 2017 5:28 AM To: Ankireddypalle Reddy; Gluster Devel (gluster-devel@xxxxxxxxxxx); gluster-users@xxxxxxxxxxx Subject: Re: [Gluster-users] Disperse mkdir fails Hi Ram, On 13/03/17 15:02, Ankireddypalle Reddy wrote: > Xavi, > CV_MAGNETIC directory on a single brick has 155683 entries. There are altogether 60 bricks in the volume. I could provide the output if you still need that. The problem is that not all bricks have the same number of entries: glusterfs1:disk1 155674 glusterfs2:disk1 155675 glusterfs3:disk1 155718 glusterfs1:disk2 155688 glusterfs2:disk2 155687 glusterfs3:disk2 155730 glusterfs1:disk3 155675 glusterfs2:disk3 155674 glusterfs3:disk3 155717 glusterfs1:disk4 155684 glusterfs2:disk4 155683 glusterfs3:disk4 155726 glusterfs1:disk5 155698 glusterfs2:disk5 155695 glusterfs3:disk5 155738 glusterfs1:disk6 155668 glusterfs2:disk6 155667 glusterfs3:disk6 155710 glusterfs1:disk7 155687 glusterfs2:disk7 155689 glusterfs3:disk7 155732 glusterfs1:disk8 155673 glusterfs2:disk8 155675 glusterfs3:disk8 155718 glusterfs4:disk1 149097 glusterfs5:disk1 149097 glusterfs6:disk1 149098 glusterfs4:disk2 149097 glusterfs5:disk2 149097 glusterfs6:disk2 149098 glusterfs4:disk3 149097 glusterfs5:disk3 149097 glusterfs6:disk3 149098 glusterfs4:disk4 149097 glusterfs5:disk4 149097 glusterfs6:disk4 149098 glusterfs4:disk5 149097 glusterfs5:disk5 149097 glusterfs6:disk5 149098 glusterfs4:disk6 149097 glusterfs5:disk6 149097 glusterfs6:disk6 149098 glusterfs4:disk7 149097 glusterfs5:disk7 149097 glusterfs6:disk7 149098 glusterfs4:disk8 149097 glusterfs5:disk8 149097 glusterfs6:disk8 149098 An small difference could be explained by concurrent operations while retrieving this data, but some bricks are way out of sync. trusted.ec.dirty and trusted.ec.version also show many discrepancies: glusterfs1:disk1 trusted.ec.dirty=0x0000000000000ba40000000000000000 glusterfs2:disk1 trusted.ec.dirty=0x0000000000000bb80000000000000000 glusterfs3:disk1 trusted.ec.dirty=0x00000000000000160000000000000000 glusterfs1:disk1 trusted.ec.version=0x0000000000084db40000000000084e11 glusterfs2:disk1 trusted.ec.version=0x0000000000084e070000000000084e0c glusterfs3:disk1 trusted.ec.version=0x000000000008426a0000000000084e11 glusterfs1:disk2 trusted.ec.dirty=0x0000000000000ba50000000000000000 glusterfs2:disk2 trusted.ec.dirty=0x0000000000000bb60000000000000000 glusterfs3:disk2 trusted.ec.dirty=0x00000000000000170000000000000000 glusterfs1:disk2 trusted.ec.version=0x000000000005ccb7000000000005cd0a glusterfs2:disk2 trusted.ec.version=0x000000000005cd00000000000005cd05 glusterfs3:disk2 trusted.ec.version=0x000000000005c166000000000005cd0a glusterfs1:disk3 trusted.ec.dirty=0x0000000000000ba50000000000000000 glusterfs2:disk3 trusted.ec.dirty=0x0000000000000bb50000000000000000 glusterfs3:disk3 trusted.ec.dirty=0x00000000000000160000000000000000 glusterfs1:disk3 trusted.ec.version=0x000000000005d0cb000000000005d123 glusterfs2:disk3 trusted.ec.version=0x000000000005d119000000000005d11e glusterfs3:disk3 trusted.ec.version=0x000000000005c57f000000000005d123 glusterfs1:disk4 trusted.ec.dirty=0x0000000000000ba00000000000000000 glusterfs2:disk4 trusted.ec.dirty=0x0000000000000bb10000000000000000 glusterfs3:disk4 trusted.ec.dirty=0x00000000000000130000000000000000 glusterfs1:disk4 trusted.ec.version=0x0000000000084e2e0000000000084e78 glusterfs2:disk4 trusted.ec.version=0x0000000000084e6e0000000000084e73 glusterfs3:disk4 trusted.ec.version=0x00000000000842d50000000000084e78 glusterfs1:disk5 trusted.ec.dirty=0x0000000000000b9a0000000000000000 glusterfs2:disk5 trusted.ec.dirty=0x0000000000002e270000000000000000 glusterfs3:disk5 trusted.ec.dirty=0x00000000000022950000000000000000 glusterfs1:disk5 trusted.ec.version=0x000000000005aa1f000000000005cd18 glusterfs2:disk5 trusted.ec.version=0x000000000005cd0d000000000005cd13 glusterfs3:disk5 trusted.ec.version=0x000000000005c180000000000005cd18 glusterfs1:disk6 trusted.ec.dirty=0x0000000000000ba20000000000000000 glusterfs2:disk6 trusted.ec.dirty=0x0000000000000bad0000000000000000 glusterfs3:disk6 trusted.ec.dirty=0x000000000000000f0000000000000000 glusterfs1:disk6 trusted.ec.version=0x000000000005ccba000000000005cce7 glusterfs2:disk6 trusted.ec.version=0x000000000005ccde000000000005cce2 glusterfs3:disk6 trusted.ec.version=0x000000000005c145000000000005cce7 glusterfs1:disk7 trusted.ec.dirty=0x0000000000000ba50000000000000000 glusterfs2:disk7 trusted.ec.dirty=0x0000000000000bab0000000000000000 glusterfs3:disk7 trusted.ec.dirty=0x000000000000000a0000000000000000 glusterfs1:disk7 trusted.ec.version=0x000000000005cd03000000000005cd0d glusterfs2:disk7 trusted.ec.version=0x000000000005cd04000000000005cd08 glusterfs3:disk7 trusted.ec.version=0x000000000005c138000000000005cd0d glusterfs1:disk8 trusted.ec.dirty=0x0000000000000bbb0000000000000000 glusterfs2:disk8 trusted.ec.dirty=0x0000000000000bc00000000000000000 glusterfs3:disk8 trusted.ec.dirty=0x00000000000000090000000000000000 glusterfs1:disk8 trusted.ec.version=0x000000000005cdc4000000000005cdcd glusterfs2:disk8 trusted.ec.version=0x000000000005cdc4000000000005cdc8 glusterfs3:disk8 trusted.ec.version=0x000000000005c158000000000005cdcd glusterfs4:disk1 trusted.ec.version=0x000000000005901d0000000000059021 glusterfs5:disk1 trusted.ec.version=0x000000000005901d0000000000059021 glusterfs6:disk1 trusted.ec.version=0x000000000005901e0000000000059022 glusterfs4:disk2 trusted.ec.version=0x000000000002d2d7000000000002d2d9 glusterfs5:disk2 trusted.ec.version=0x000000000002d2d7000000000002d2d9 glusterfs6:disk2 trusted.ec.version=0x000000000002d2d8000000000002d2da glusterfs4:disk3 trusted.ec.version=0x000000000002d2d7000000000002d2d9 glusterfs5:disk3 trusted.ec.version=0x000000000002d2d7000000000002d2d9 glusterfs6:disk3 trusted.ec.version=0x000000000002d2d8000000000002d2da glusterfs4:disk4 trusted.ec.version=0x000000000002d2d7000000000002d2d9 glusterfs5:disk4 trusted.ec.version=0x000000000002d2d7000000000002d2d9 glusterfs6:disk4 trusted.ec.version=0x000000000002d2d8000000000002d2da glusterfs4:disk5 trusted.ec.version=0x000000000002d2d7000000000002d2d9 glusterfs5:disk5 trusted.ec.version=0x000000000002d2d7000000000002d2d9 glusterfs6:disk5 trusted.ec.version=0x000000000002d2d8000000000002d2da glusterfs4:disk6 trusted.ec.version=0x000000000002d2d7000000000002d2d9 glusterfs5:disk6 trusted.ec.version=0x000000000002d2d7000000000002d2d9 glusterfs6:disk6 trusted.ec.version=0x000000000002d2d8000000000002d2da glusterfs4:disk7 trusted.ec.version=0x000000000002d2d7000000000002d2d9 glusterfs5:disk7 trusted.ec.version=0x000000000002d2d7000000000002d2d9 glusterfs6:disk7 trusted.ec.version=0x000000000002d2d8000000000002d2da glusterfs4:disk8 trusted.ec.version=0x000000000002d2d7000000000002d2d9 glusterfs5:disk8 trusted.ec.version=0x000000000002d2d7000000000002d2d9 glusterfs6:disk8 trusted.ec.version=0x000000000002d2d8000000000002d2da Newer bricks seem to be healthy, but old bricks have a lot of differences. I also see that trusted.glusterfs.dht is not set for newer bricks, and the full range of hashes are assigned to the old bricks (at least for the CV_MAGNETIC directory). This probably means that a rebalance has not been executed on the volume after adding the new bricks (or it failed). This will require much more investigation and knowledge about how do you things, from how many clients, ... Xavi > > Thanks and Regards, > Ram > > -----Original Message----- > From: Xavier Hernandez [mailto:xhernandez@xxxxxxxxxx] > Sent: Monday, March 13, 2017 9:56 AM > To: Ankireddypalle Reddy; Gluster Devel (gluster-devel@xxxxxxxxxxx); > gluster-users@xxxxxxxxxxx > Subject: Re: [Gluster-users] Disperse mkdir fails > > Hi Ram, > > On 13/03/17 14:13, Ankireddypalle Reddy wrote: >> Attachment (1): >> >> 1 >> >> >> >> data.txt >> <https://imap.commvault.com/webconsole/embedded.do?url=https://imap.c >> o >> mmvault.com/webconsole/api/drive/publicshare/346714/file/02bb2e2504a5 >> 4 >> 3e58cc89bce9f350f8c/action/preview&downloadUrl=https://imap.commvault. >> com/webconsole/api/contentstore/publicshare/346714/file/02bb2e2504a54 >> 3 >> e58cc89bce9f350f8c/action/download> >> [Download] >> <https://imap.commvault.com/webconsole/api/contentstore/publicshare/3 >> 4 >> 6714/file/02bb2e2504a543e58cc89bce9f350f8c/action/download>(17.63 >> KB) >> >> Xavier, >> Please find attached the required info from all the >> six nodes of the cluster. > > I asked for the contents of the CV_MAGNETIC because this is the damaged directory, not the parent. But anyway we can see that the number of hard links of the directory differs for each brick, so this means that the number of subdirectories is different on each brick. A small difference could be explainable by the current activity of the volume while the data has been captured, but the differences are too big. > >> We need to find >> 1) What is the solution through which this problem can >> be avoided. >> 2) How do we fix the current state of the cluster. >> >> Thanks and Regards, >> Ram >> -----Original Message----- >> From: Xavier Hernandez [mailto:xhernandez@xxxxxxxxxx] >> Sent: Friday, March 10, 2017 3:34 AM >> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel@xxxxxxxxxxx); >> gluster-users@xxxxxxxxxxx >> Subject: Re: [Gluster-users] Disperse mkdir fails >> >> Hi Ram, >> >> On 09/03/17 20:15, Ankireddypalle Reddy wrote: >>> Xavi, >>> Thanks for checking this. >>> 1) mkdir returns errnum 5. EIO. >>> 2) The specified directory is the parent directory >>> under >> which all the data in the gluster volume will be stored. Current >> around 160TB of 262 TB is consumed. >> >> I only need the first level entries of that directory, not the entire >> tree of entries. This should be in the order of thousands, right ? >> >> We need to make sure that all bricks have the same entries in this >> directory. Otherwise we would need to check other things. >> >>> 3) It is extremely difficult to list the exact sequence >> of FOPS that would have been issued to the directory. The storage is >> heavily used and lot of sub directories are present inside this directory. >>> >>> Are you looking for the extended attributes for this >> directory from all the bricks inside the volume. There are about 60 bricks. >> >> If possible, yes. >> >> However, if there's a lot of modifications on that directory while >> you are getting the xattr, it's possible that you get inconsistent >> values, but they are not really inconsistent. >> >> If possible, you should get that information pausing all activity to >> that directory. >> >> Xavi >> >>> >>> Thanks and Regards, >>> Ram >>> >>> -----Original Message----- >>> From: Xavier Hernandez [mailto:xhernandez@xxxxxxxxxx] >>> Sent: Thursday, March 09, 2017 11:15 AM >>> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel@xxxxxxxxxxx); >>> gluster-users@xxxxxxxxxxx >>> Subject: Re: [Gluster-users] Disperse mkdir fails >>> >>> Hi Ram, >>> >>> On 09/03/17 16:52, Ankireddypalle Reddy wrote: >>>> Attachment (1): >>>> >>>> 1 >>>> >>>> >>>> >>>> info.txt >>>> <https://imap.commvault.com/webconsole/embedded.do?url=https://imap. >>>> c >>>> o >>>> mmvault.com/webconsole/api/drive/publicshare/346714/file/3037641a3f >>>> 9 >>>> b >>>> 4 >>>> 133920b1b251ed32d5d/action/preview&downloadUrl=https://imap.commvault. >>>> com/webconsole/api/contentstore/publicshare/346714/file/3037641a3f9 >>>> b >>>> 4 >>>> 1 >>>> 33920b1b251ed32d5d/action/download> >>>> [Download] >>>> <https://imap.commvault.com/webconsole/api/contentstore/publicshare >>>> / >>>> 3 >>>> 4 >>>> 6714/file/3037641a3f9b4133920b1b251ed32d5d/action/download>(3.35 >>>> KB) >>>> >>>> Hi, >>>> >>>> I have a disperse gluster volume with 6 servers. 262TB of >>>> usable capacity. Gluster version is 3.7.19. >>>> >>>> glusterfs1, glusterf2 and glusterfs3 nodes were initially >>>> used for creating the volume. Nodes glusterf4, glusterfs5 and >>>> glusterfs6 were later added to the volume. >>>> >>>> >>>> >>>> Directory creation failed on a directory called >>>> /ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC. >>>> >>>> # file: ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC >>>> >>>> glusterfs.gfid.string="e8e51015-616f-4f04-b9d2-92f46eb5cfc7" >>>> >>>> >>>> >>>> gluster mount log contains lot of following errors: >>>> >>>> [2017-03-09 15:32:36.773937] W [MSGID: 122056] >>>> [ec-combine.c:875:ec_combine_check] 0-StoragePool-disperse-7: >>>> Mismatching xdata in answers of 'LOOKUP' for >>>> e8e51015-616f-4f04-b9d2-92f46eb5cfc7 >>>> >>>> >>>> >>>> The directory seems to be out of sync between nodes >>>> glusterfs1, >>>> glusterfs2 and glusterfs3. Each has different version. >>>> >>>> >>>> >>>> trusted.ec.version=0x00000000000839f00000000000083a4d >>>> >>>> trusted.ec.version=0x0000000000082ea40000000000083a4b >>>> >>>> trusted.ec.version=0x0000000000083a760000000000083a7b >>>> >>>> >>>> >>>> Self-heal does not seem to be healing this directory. >>>> >>> >>> This is very similar to what happened the other time. Once more than >>> 1 >> brick is damaged, self-heal cannot do anything to heal it on a 2+1 >> configuration. >>> >>> What error does return the mkdir request ? >>> >>> Does the directory you are trying to create already exist on some brick ? >>> >>> Can you show all the remaining extended attributes of the directory ? >>> >>> It would also be useful to have the directory contents on each brick >> (an 'ls -l'). In this case, include the name of the directory you are >> trying to create. >>> >>> Can you explain a detailed sequence of operations done on that >> directory since the last time you successfully created a new subdirectory ? >>> including any metadata change. >>> >>> Xavi >>> >>>> >>>> >>>> Thanks and Regards, >>>> >>>> Ram >>>> >>>> ***************************Legal >>>> Disclaimer*************************** >>>> "This communication may contain confidential and privileged >>>> material for the sole use of the intended recipient. Any >>>> unauthorized review, use or distribution by others is strictly >>>> prohibited. If you have received the message by mistake, please >>>> advise the sender by reply email and delete the message. Thank you." >>>> ******************************************************************* >>>> * >>>> * >>>> * >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users@xxxxxxxxxxx >>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>> >>> ***************************Legal >>> Disclaimer*************************** >>> "This communication may contain confidential and privileged material >>> for the sole use of the intended recipient. Any unauthorized review, >>> use or distribution by others is strictly prohibited. If you have >>> received the message by mistake, please advise the sender by reply >> email and delete the message. Thank you." >>> ******************************************************************** >>> * >>> * >>> >> >> ***************************Legal >> Disclaimer*************************** >> "This communication may contain confidential and privileged material >> for the sole use of the intended recipient. Any unauthorized review, >> use or distribution by others is strictly prohibited. If you have >> received the message by mistake, please advise the sender by reply >> email and delete the message. Thank you." >> ********************************************************************* >> * > > ***************************Legal Disclaimer*************************** > "This communication may contain confidential and privileged material > for the sole use of the intended recipient. Any unauthorized review, > use or distribution by others is strictly prohibited. If you have > received the message by mistake, please advise the sender by reply email and delete the message. Thank you." > ********************************************************************** > ***************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." ********************************************************************** _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-devel