Re: Ceph MDS failing because of corrupted dentries in lost+found after update from 17.2.7 to 18.2.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So the mount hung? Can you see anything suspicious in the logs?

On Fri, Aug 2, 2024 at 7:17 PM Justin Lee <justin.adam.lee@xxxxxxxxx> wrote:

> Hi Dhairya,
>
> Thanks for the response! We tried removing it as you suggested with `rm
> -rf` but the command just hangs indefinitely with no output. We are also
> unable to `ls lost_found`, or otherwise interact with the directory's
> contents.
>
> Best,
> Justin lee
>
> On Fri, Aug 2, 2024 at 8:24 AM Dhairya Parmar <dparmar@xxxxxxxxxx> wrote:
>
>> Hi Justin,
>>
>> You should able to delete inodes from the lost+found dirs just by simply
>> `sudo rm -rf lost+found/<ino>`
>>
>> What do you get when you try to delete? Do you get `EROFS`?
>>
>> On Fri, Aug 2, 2024 at 8:42 AM Justin Lee <justin.adam.lee@xxxxxxxxx>
>> wrote:
>>
>>> After we updated our ceph cluster from 17.2.7 to 18.2.0 the MDS kept
>>> being
>>> marked as damaged and stuck in up:standby with these errors in the log.
>>>
>>> debug    -12> 2024-07-14T21:22:19.962+0000 7f020cf3a700  1
>>> mds.0.cache.den(0x4 1000b3bcfea) loaded already corrupt dentry:
>>> [dentry #0x1/lost+found/1000b3bcfea [head,head] rep@0.0 NULL (dversion
>>> lock) pv=0 v=2 ino=(nil) state=0 0x558ca63b6500]
>>> debug    -11> 2024-07-14T21:22:19.962+0000 7f020cf3a700 10
>>> mds.0.cache.dir(0x4) go_bad_dentry 1000b3bcfea
>>>
>>> these log lines are repeated a bunch of times in our MDS logs, all on
>>> dentries that are within the lost+found directory. After reading this
>>> mailing
>>> list post <https://www.spinics.net/lists/ceph-users/msg77325.html>, we
>>> tried setting ceph config set mds mds_go_bad_corrupt_dentry false. This
>>> seemed to successfully circumvent the issue, however, after a few seconds
>>> our MDS crashes. Our 3 MDS are now stuck in a cycle of active -> crash ->
>>> standby -> back to active. Because of this our actual ceph fs is
>>> extremely
>>> laggy.
>>>
>>> We read here <https://docs.ceph.com/en/latest/releases/reef/#cephfs>
>>> that
>>> reef now makes it possible to delete the lost+found directory, which
>>> might
>>> solve our problem, but it is inaccessible, to cd, ls, rm, etc.
>>>
>>> Has anyone seen this type of issue or know how to solve it? Thanks!
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>
>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux