Re: MDS crashes to damaged metadata

"Stolte, Felix" <f.stolte@xxxxxxxxxxxxx> · Sat, 13 May 2023 13:17:42 +0000

Hi Patrick,

we have been running one daily snapshot since december and our cephfs crashed 3 times because of this https://tracker.ceph.com/issues/38452

We currentliy have 19 files with corrupt metadata found by your first-damage.py script. We isolated the these files from access by users and are waiting for a fix before we remove them with your script (or maybe a new way?)

Today we upgraded our cluster from 16.2.11 and 16.2.13. After Upgrading the mds  servers, cluster health went to ERROR MDS_DAMAGE. 'ceph tells mds 0 damage ls‘ is showing me the same files as your script (initially only a part, after a cephfs scrub all of them).

I noticed "mds: catch damage to CDentry’s first member before persisting (issue#58482, pr#50781, Patrick Donnelly)“ in the change logs for 16.2.13  and like to ask you the following questions:

a) can we repair the damaged files online now instead of bringing down the whole fs and using the python script?

b) should we set one of the new mds options in our specific case to avoid our fileserver crashing because of the wrong snap ids?

c) will your patch prevent wrong snap ids in the future?

Regards
Felix

---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------

Am 08.01.2023 um 02:14 schrieb Patrick Donnelly <pdonnell@xxxxxxxxxx>:

On Thu, Dec 15, 2022 at 9:32 AM Stolte, Felix <f.stolte@xxxxxxxxxxxxx> wrote:

Hi Patrick,

we used your script to repair the damaged objects on the weekend and it went smoothly. Thanks for your support.

We adjusted your script to scan for damaged files on a daily basis, runtime is about 6h. Until thursday last week, we had exactly the same 17 Files. On thursday at 13:05 a snapshot was created and our active mds crashed once at this time (snapshot was created):

2022-12-08T13:05:48.919+0100 7f440afec700 -1 /build/ceph-16.2.10/src/mds/ScatterLock.h: In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f440afec700 time 2022-12-08T13:05:48.921223+0100
/build/ceph-16.2.10/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state LOCK_XLOCK || state LOCK_XLOCKDONE)

12 Minutes lates the unlink_local error crashes appeared again. This time with a new file. During debugging we noticed a MTU mismatch between MDS (1500) and client (9000) with cephfs kernel mount. The client is also creating the snapshots via mkdir in the .snap directory.

We disabled snapshot creation for now, but really need this feature. I uploaded the mds logs of the first crash along with the information above to https://tracker.ceph.com/issues/38452

I would greatly appreciate it, if you could answer me the following question:

Is the Bug related to our MTU Mismatch? We fixed the MTU Issue going back to 1500 on all nodes in the ceph public network on the weekend also.

I doubt it.

If you need a debug level 20 log of the ScatterLock for further analysis, i could schedule snapshots at the end of our workdays and increase the debug level 5 Minutes arround snap shot creation.

This would be very helpful!

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx