Re: Disperse mkdir fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Xavi,
           Thanks for checking this.  We have an external metadata server which keeps track of every file that gets written to the volume and has the capability to validate the file contents. Will use this capability to validate the file contents. Once the data is verified will the following sequence of steps be sufficient to restore the volume.

1) Rebalance the volume.
2) After rebalance is complete, stop ingesting more data to the volume.
3) Let the pending heals complete.
4) Stop the volume
5) For any heals that fail because of mismatching version/dirty extended attributes on the directories,  set this to a matching value on all the nodes.

Thanks and Regards,
Ram

-----Original Message-----
From: Xavier Hernandez [mailto:xhernandez@xxxxxxxxxx] 
Sent: Tuesday, March 14, 2017 5:28 AM
To: Ankireddypalle Reddy; Gluster Devel (gluster-devel@xxxxxxxxxxx); gluster-users@xxxxxxxxxxx
Subject: Re:  Disperse mkdir fails

Hi Ram,

On 13/03/17 15:02, Ankireddypalle Reddy wrote:
> Xavi,
>                CV_MAGNETIC directory on a single brick  has  155683 entries.  There are altogether 60 bricks in the volume. I could provide the output if you still need that.

The problem is that not all bricks have the same number of entries:

glusterfs1:disk1 155674
glusterfs2:disk1 155675
glusterfs3:disk1 155718

glusterfs1:disk2 155688
glusterfs2:disk2 155687
glusterfs3:disk2 155730

glusterfs1:disk3 155675
glusterfs2:disk3 155674
glusterfs3:disk3 155717

glusterfs1:disk4 155684
glusterfs2:disk4 155683
glusterfs3:disk4 155726

glusterfs1:disk5 155698
glusterfs2:disk5 155695
glusterfs3:disk5 155738

glusterfs1:disk6 155668
glusterfs2:disk6 155667
glusterfs3:disk6 155710

glusterfs1:disk7 155687
glusterfs2:disk7 155689
glusterfs3:disk7 155732

glusterfs1:disk8 155673
glusterfs2:disk8 155675
glusterfs3:disk8 155718

glusterfs4:disk1 149097
glusterfs5:disk1 149097
glusterfs6:disk1 149098

glusterfs4:disk2 149097
glusterfs5:disk2 149097
glusterfs6:disk2 149098

glusterfs4:disk3 149097
glusterfs5:disk3 149097
glusterfs6:disk3 149098

glusterfs4:disk4 149097
glusterfs5:disk4 149097
glusterfs6:disk4 149098

glusterfs4:disk5 149097
glusterfs5:disk5 149097
glusterfs6:disk5 149098

glusterfs4:disk6 149097
glusterfs5:disk6 149097
glusterfs6:disk6 149098

glusterfs4:disk7 149097
glusterfs5:disk7 149097
glusterfs6:disk7 149098

glusterfs4:disk8 149097
glusterfs5:disk8 149097
glusterfs6:disk8 149098

An small difference could be explained by concurrent operations while retrieving this data, but some bricks are way out of sync.

trusted.ec.dirty and trusted.ec.version also show many discrepancies:

glusterfs1:disk1 trusted.ec.dirty=0x0000000000000ba40000000000000000
glusterfs2:disk1 trusted.ec.dirty=0x0000000000000bb80000000000000000
glusterfs3:disk1 trusted.ec.dirty=0x00000000000000160000000000000000
glusterfs1:disk1 trusted.ec.version=0x0000000000084db40000000000084e11
glusterfs2:disk1 trusted.ec.version=0x0000000000084e070000000000084e0c
glusterfs3:disk1 trusted.ec.version=0x000000000008426a0000000000084e11

glusterfs1:disk2 trusted.ec.dirty=0x0000000000000ba50000000000000000
glusterfs2:disk2 trusted.ec.dirty=0x0000000000000bb60000000000000000
glusterfs3:disk2 trusted.ec.dirty=0x00000000000000170000000000000000
glusterfs1:disk2 trusted.ec.version=0x000000000005ccb7000000000005cd0a
glusterfs2:disk2 trusted.ec.version=0x000000000005cd00000000000005cd05
glusterfs3:disk2 trusted.ec.version=0x000000000005c166000000000005cd0a

glusterfs1:disk3 trusted.ec.dirty=0x0000000000000ba50000000000000000
glusterfs2:disk3 trusted.ec.dirty=0x0000000000000bb50000000000000000
glusterfs3:disk3 trusted.ec.dirty=0x00000000000000160000000000000000
glusterfs1:disk3 trusted.ec.version=0x000000000005d0cb000000000005d123
glusterfs2:disk3 trusted.ec.version=0x000000000005d119000000000005d11e
glusterfs3:disk3 trusted.ec.version=0x000000000005c57f000000000005d123

glusterfs1:disk4 trusted.ec.dirty=0x0000000000000ba00000000000000000
glusterfs2:disk4 trusted.ec.dirty=0x0000000000000bb10000000000000000
glusterfs3:disk4 trusted.ec.dirty=0x00000000000000130000000000000000
glusterfs1:disk4 trusted.ec.version=0x0000000000084e2e0000000000084e78
glusterfs2:disk4 trusted.ec.version=0x0000000000084e6e0000000000084e73
glusterfs3:disk4 trusted.ec.version=0x00000000000842d50000000000084e78

glusterfs1:disk5 trusted.ec.dirty=0x0000000000000b9a0000000000000000
glusterfs2:disk5 trusted.ec.dirty=0x0000000000002e270000000000000000
glusterfs3:disk5 trusted.ec.dirty=0x00000000000022950000000000000000
glusterfs1:disk5 trusted.ec.version=0x000000000005aa1f000000000005cd18
glusterfs2:disk5 trusted.ec.version=0x000000000005cd0d000000000005cd13
glusterfs3:disk5 trusted.ec.version=0x000000000005c180000000000005cd18

glusterfs1:disk6 trusted.ec.dirty=0x0000000000000ba20000000000000000
glusterfs2:disk6 trusted.ec.dirty=0x0000000000000bad0000000000000000
glusterfs3:disk6 trusted.ec.dirty=0x000000000000000f0000000000000000
glusterfs1:disk6 trusted.ec.version=0x000000000005ccba000000000005cce7
glusterfs2:disk6 trusted.ec.version=0x000000000005ccde000000000005cce2
glusterfs3:disk6 trusted.ec.version=0x000000000005c145000000000005cce7


glusterfs1:disk7 trusted.ec.dirty=0x0000000000000ba50000000000000000
glusterfs2:disk7 trusted.ec.dirty=0x0000000000000bab0000000000000000
glusterfs3:disk7 trusted.ec.dirty=0x000000000000000a0000000000000000
glusterfs1:disk7 trusted.ec.version=0x000000000005cd03000000000005cd0d
glusterfs2:disk7 trusted.ec.version=0x000000000005cd04000000000005cd08
glusterfs3:disk7 trusted.ec.version=0x000000000005c138000000000005cd0d


glusterfs1:disk8 trusted.ec.dirty=0x0000000000000bbb0000000000000000
glusterfs2:disk8 trusted.ec.dirty=0x0000000000000bc00000000000000000
glusterfs3:disk8 trusted.ec.dirty=0x00000000000000090000000000000000
glusterfs1:disk8 trusted.ec.version=0x000000000005cdc4000000000005cdcd
glusterfs2:disk8 trusted.ec.version=0x000000000005cdc4000000000005cdc8
glusterfs3:disk8 trusted.ec.version=0x000000000005c158000000000005cdcd

glusterfs4:disk1 trusted.ec.version=0x000000000005901d0000000000059021
glusterfs5:disk1 trusted.ec.version=0x000000000005901d0000000000059021
glusterfs6:disk1 trusted.ec.version=0x000000000005901e0000000000059022

glusterfs4:disk2 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs5:disk2 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs6:disk2 trusted.ec.version=0x000000000002d2d8000000000002d2da

glusterfs4:disk3 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs5:disk3 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs6:disk3 trusted.ec.version=0x000000000002d2d8000000000002d2da

glusterfs4:disk4 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs5:disk4 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs6:disk4 trusted.ec.version=0x000000000002d2d8000000000002d2da

glusterfs4:disk5 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs5:disk5 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs6:disk5 trusted.ec.version=0x000000000002d2d8000000000002d2da

glusterfs4:disk6 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs5:disk6 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs6:disk6 trusted.ec.version=0x000000000002d2d8000000000002d2da

glusterfs4:disk7 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs5:disk7 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs6:disk7 trusted.ec.version=0x000000000002d2d8000000000002d2da

glusterfs4:disk8 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs5:disk8 trusted.ec.version=0x000000000002d2d7000000000002d2d9
glusterfs6:disk8 trusted.ec.version=0x000000000002d2d8000000000002d2da

Newer bricks seem to be healthy, but old bricks have a lot of differences.

I also see that trusted.glusterfs.dht is not set for newer bricks, and the full range of hashes are assigned to the old bricks (at least for the CV_MAGNETIC directory). This probably means that a rebalance has not been executed on the volume after adding the new bricks (or it failed).

This will require much more investigation and knowledge about how do you things, from how many clients, ...

Xavi

>
> Thanks and Regards,
> Ram
>
> -----Original Message-----
> From: Xavier Hernandez [mailto:xhernandez@xxxxxxxxxx]
> Sent: Monday, March 13, 2017 9:56 AM
> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel@xxxxxxxxxxx); 
> gluster-users@xxxxxxxxxxx
> Subject: Re:  Disperse mkdir fails
>
> Hi Ram,
>
> On 13/03/17 14:13, Ankireddypalle Reddy wrote:
>> Attachment (1):
>>
>> 1
>>
>> 	
>>
>> data.txt
>> <https://imap.commvault.com/webconsole/embedded.do?url=https://imap.c
>> o
>> mmvault.com/webconsole/api/drive/publicshare/346714/file/02bb2e2504a5
>> 4 
>> 3e58cc89bce9f350f8c/action/preview&downloadUrl=https://imap.commvault.
>> com/webconsole/api/contentstore/publicshare/346714/file/02bb2e2504a54
>> 3
>> e58cc89bce9f350f8c/action/download>
>> [Download]
>> <https://imap.commvault.com/webconsole/api/contentstore/publicshare/3
>> 4
>> 6714/file/02bb2e2504a543e58cc89bce9f350f8c/action/download>(17.63
>> KB)
>>
>> Xavier,
>>                Please find attached the required info from all the 
>> six nodes of the cluster.
>
> I asked for the contents of the CV_MAGNETIC because this is the damaged directory, not the parent. But anyway we can see that the number of hard links of the directory differs for each brick, so this means that the number of subdirectories is different on each brick. A small difference could be explainable by the current activity of the volume while the data has been captured, but the differences are too big.
>
>>  We need to find
>>                1) What is the solution through which this problem can 
>> be avoided.
>>                2) How do we fix the current state of the cluster.
>>
>> Thanks and Regards,
>> Ram
>> -----Original Message-----
>> From: Xavier Hernandez [mailto:xhernandez@xxxxxxxxxx]
>> Sent: Friday, March 10, 2017 3:34 AM
>> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel@xxxxxxxxxxx); 
>> gluster-users@xxxxxxxxxxx
>> Subject: Re:  Disperse mkdir fails
>>
>> Hi Ram,
>>
>> On 09/03/17 20:15, Ankireddypalle Reddy wrote:
>>> Xavi,
>>>             Thanks for checking this.
>>>             1) mkdir returns errnum 5. EIO.
>>>             2)  The specified directory is the parent directory 
>>> under
>> which all the data in the gluster volume will be stored. Current 
>> around 160TB of 262 TB is  consumed.
>>
>> I only need the first level entries of that directory, not the entire 
>> tree of entries. This should be in the order of thousands, right ?
>>
>> We need to make sure that all bricks have the same entries in this 
>> directory. Otherwise we would need to check other things.
>>
>>>             3)  It is extremely difficult to list the exact sequence
>> of FOPS that would have been issued to the directory. The storage is 
>> heavily used and lot of sub directories are present inside this directory.
>>>
>>>            Are you looking for the extended attributes for this
>> directory from all the bricks inside the volume.  There are about 60 bricks.
>>
>> If possible, yes.
>>
>> However, if there's a lot of modifications on that directory while 
>> you are getting the xattr, it's possible that you get inconsistent 
>> values, but they are not really inconsistent.
>>
>> If possible, you should get that information pausing all activity to 
>> that directory.
>>
>> Xavi
>>
>>>
>>> Thanks and Regards,
>>> Ram
>>>
>>> -----Original Message-----
>>> From: Xavier Hernandez [mailto:xhernandez@xxxxxxxxxx]
>>> Sent: Thursday, March 09, 2017 11:15 AM
>>> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel@xxxxxxxxxxx); 
>>> gluster-users@xxxxxxxxxxx
>>> Subject: Re:  Disperse mkdir fails
>>>
>>> Hi Ram,
>>>
>>> On 09/03/17 16:52, Ankireddypalle Reddy wrote:
>>>> Attachment (1):
>>>>
>>>> 1
>>>>
>>>>
>>>>
>>>> info.txt
>>>> <https://imap.commvault.com/webconsole/embedded.do?url=https://imap.
>>>> c
>>>> o
>>>> mmvault.com/webconsole/api/drive/publicshare/346714/file/3037641a3f
>>>> 9
>>>> b
>>>> 4
>>>> 133920b1b251ed32d5d/action/preview&downloadUrl=https://imap.commvault.
>>>> com/webconsole/api/contentstore/publicshare/346714/file/3037641a3f9
>>>> b
>>>> 4
>>>> 1
>>>> 33920b1b251ed32d5d/action/download>
>>>> [Download]
>>>> <https://imap.commvault.com/webconsole/api/contentstore/publicshare
>>>> /
>>>> 3
>>>> 4
>>>> 6714/file/3037641a3f9b4133920b1b251ed32d5d/action/download>(3.35
>>>> KB)
>>>>
>>>> Hi,
>>>>
>>>>         I have a disperse gluster volume  with 6 servers. 262TB of 
>>>> usable capacity.  Gluster version is 3.7.19.
>>>>
>>>>         glusterfs1, glusterf2 and glusterfs3 nodes were initially 
>>>> used for creating the volume. Nodes glusterf4, glusterfs5 and
>>>> glusterfs6 were later added to the volume.
>>>>
>>>>
>>>>
>>>>         Directory creation failed on a directory called 
>>>> /ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC.
>>>>
>>>>         # file: ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC
>>>>
>>>>         glusterfs.gfid.string="e8e51015-616f-4f04-b9d2-92f46eb5cfc7"
>>>>
>>>>
>>>>
>>>>         gluster mount log contains lot of following errors:
>>>>
>>>>         [2017-03-09 15:32:36.773937] W [MSGID: 122056] 
>>>> [ec-combine.c:875:ec_combine_check] 0-StoragePool-disperse-7:
>>>> Mismatching xdata in answers of 'LOOKUP' for
>>>> e8e51015-616f-4f04-b9d2-92f46eb5cfc7
>>>>
>>>>
>>>>
>>>>         The directory seems to be out of sync between nodes 
>>>> glusterfs1,
>>>> glusterfs2 and glusterfs3. Each has different version.
>>>>
>>>>
>>>>
>>>>          trusted.ec.version=0x00000000000839f00000000000083a4d
>>>>
>>>>          trusted.ec.version=0x0000000000082ea40000000000083a4b
>>>>
>>>>          trusted.ec.version=0x0000000000083a760000000000083a7b
>>>>
>>>>
>>>>
>>>>          Self-heal does not seem to be healing this directory.
>>>>
>>>
>>> This is very similar to what happened the other time. Once more than
>>> 1
>> brick is damaged, self-heal cannot do anything to heal it on a 2+1 
>> configuration.
>>>
>>> What error does return the mkdir request ?
>>>
>>> Does the directory you are trying to create already exist on some brick ?
>>>
>>> Can you show all the remaining extended attributes of the directory ?
>>>
>>> It would also be useful to have the directory contents on each brick
>> (an 'ls -l'). In this case, include the name of the directory you are 
>> trying to create.
>>>
>>> Can you explain a detailed sequence of operations done on that
>> directory since the last time you successfully created a new subdirectory ?
>>> including any metadata change.
>>>
>>> Xavi
>>>
>>>>
>>>>
>>>> Thanks and Regards,
>>>>
>>>> Ram
>>>>
>>>> ***************************Legal
>>>> Disclaimer***************************
>>>> "This communication may contain confidential and privileged 
>>>> material for the sole use of the intended recipient. Any 
>>>> unauthorized review, use or distribution by others is strictly 
>>>> prohibited. If you have received the message by mistake, please 
>>>> advise the sender by reply email and delete the message. Thank you."
>>>> *******************************************************************
>>>> *
>>>> *
>>>> *
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users@xxxxxxxxxxx
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>> ***************************Legal
>>> Disclaimer***************************
>>> "This communication may contain confidential and privileged material 
>>> for the sole use of the intended recipient. Any unauthorized review, 
>>> use or distribution by others is strictly prohibited. If you have 
>>> received the message by mistake, please advise the sender by reply
>> email and delete the message. Thank you."
>>> ********************************************************************
>>> *
>>> *
>>>
>>
>> ***************************Legal 
>> Disclaimer***************************
>> "This communication may contain confidential and privileged material 
>> for the sole use of the intended recipient. Any unauthorized review, 
>> use or distribution by others is strictly prohibited. If you have 
>> received the message by mistake, please advise the sender by reply 
>> email and delete the message. Thank you."
>> *********************************************************************
>> *
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient. Any unauthorized review, 
> use or distribution by others is strictly prohibited. If you have 
> received the message by mistake, please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************
>

***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux