Hi Patrick, it has been a year now and we did not have a single crash since upgrading to 16.2.13. We still have the 19 corrupted files which are reported by 'damage ls‘. Is it now possible to delete the corrupted files without taking the filesystem offline? Am 22.05.2023 um 20:23 schrieb Patrick Donnelly <pdonnell@xxxxxxxxxx>: Hi Felix, On Sat, May 13, 2023 at 9:18 AM Stolte, Felix <f.stolte@xxxxxxxxxxxxx> wrote: Hi Patrick, we have been running one daily snapshot since december and our cephfs crashed 3 times because of this https://tracker.ceph.com/issues/38452 We currentliy have 19 files with corrupt metadata found by your first-damage.py script. We isolated the these files from access by users and are waiting for a fix before we remove them with your script (or maybe a new way?) No other fix is anticipated at this time. Probably one will be developed after the cause is understood. Today we upgraded our cluster from 16.2.11 and 16.2.13. After Upgrading the mds servers, cluster health went to ERROR MDS_DAMAGE. 'ceph tells mds 0 damage ls‘ is showing me the same files as your script (initially only a part, after a cephfs scrub all of them). This is expected. Once the dentries are marked damaged, the MDS won't allow operations on those files (like those triggering tracker #38452). I noticed "mds: catch damage to CDentry’s first member before persisting (issue#58482, pr#50781, Patrick Donnelly)“ in the change logs for 16.2.13 and like to ask you the following questions: a) can we repair the damaged files online now instead of bringing down the whole fs and using the python script? Not yet. b) should we set one of the new mds options in our specific case to avoid our fileserver crashing because of the wrong snap ids? Have your MDS crashed or just marked the dentries damaged? If you can reproduce a crash with detailed logs (debug_mds=20), that would be incredibly helpful. c) will your patch prevent wrong snap ids in the future? It will prevent persisting the damage. -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D mit freundlichem Gruß Felix Stolte IT-Services mailto: f.stolte@xxxxxxxxxxxxx Tel: 02461-619243 --------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------- Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Stefan Müller Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende), Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens --------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------- _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx