Re: MDS crashes to damaged metadata

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



You could try manually deleting the files from the directory
fragments, using `rados` commands. Make sure to flush your MDS journal
first and take the fs offline (`ceph fs fail`).

On Tue, Jun 4, 2024 at 8:50 AM Stolte, Felix <f.stolte@xxxxxxxxxxxxx> wrote:
>
> Hi Patrick,
>
> it has been a year now and we did not have a single crash since upgrading to 16.2.13. We still have the 19 corrupted files which are reported by 'damage ls‘. Is it now possible to delete the corrupted files without taking the filesystem offline?
>
> Am 22.05.2023 um 20:23 schrieb Patrick Donnelly <pdonnell@xxxxxxxxxx>:
>
> Hi Felix,
>
> On Sat, May 13, 2023 at 9:18 AM Stolte, Felix <f.stolte@xxxxxxxxxxxxx> wrote:
>
> Hi Patrick,
>
> we have been running one daily snapshot since december and our cephfs crashed 3 times because of this https://tracker.ceph.com/issues/38452
>
> We currentliy have 19 files with corrupt metadata found by your first-damage.py script. We isolated the these files from access by users and are waiting for a fix before we remove them with your script (or maybe a new way?)
>
> No other fix is anticipated at this time. Probably one will be
> developed after the cause is understood.
>
> Today we upgraded our cluster from 16.2.11 and 16.2.13. After Upgrading the mds  servers, cluster health went to ERROR MDS_DAMAGE. 'ceph tells mds 0 damage ls‘ is showing me the same files as your script (initially only a part, after a cephfs scrub all of them).
>
> This is expected. Once the dentries are marked damaged, the MDS won't
> allow operations on those files (like those triggering tracker
> #38452).
>
> I noticed "mds: catch damage to CDentry’s first member before persisting (issue#58482, pr#50781, Patrick Donnelly)“ in the change logs for 16.2.13  and like to ask you the following questions:
>
> a) can we repair the damaged files online now instead of bringing down the whole fs and using the python script?
>
> Not yet.
>
> b) should we set one of the new mds options in our specific case to avoid our fileserver crashing because of the wrong snap ids?
>
> Have your MDS crashed or just marked the dentries damaged? If you can
> reproduce a crash with detailed logs (debug_mds=20), that would be
> incredibly helpful.
>
> c) will your patch prevent wrong snap ids in the future?
>
> It will prevent persisting the damage.
>
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Red Hat Partner Engineer
> IBM, Inc.
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>
>
> mit freundlichem Gruß
> Felix Stolte
>
> IT-Services
> mailto: f.stolte@xxxxxxxxxxxxx
> Tel: 02461-619243
>
> ---------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Stefan Müller
> Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende),
> Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens
> ---------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx



-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux